


ON THE CONSTRUCTION OF GROUP DIVISIBLE INCOMPLETE BLOCK 
DESIGNS! 


By R. C. Boss, 8. 8. SHRIKHANDE AND K. N. BHATTACHARYA 


University of North Carolina, Nagpur College of Science and University of Kansas, 
and University of Calcutta 


1. Summary. It has been shown in [1] that all partially balanced incomplete 
block (PBIB) designs with two asséCiate classes, can be divided into a small 
number ef types according to the nature of the association relations among the 
treatments. One simple and important type is the group divisible (GD). The 
combinatorial properties of GD designs have been studied in [2] and the analysis 
along with that for other types is given in [1]. Here we give methods of construct- 
ing GD designs. These designs are likely to prove useful in agricultural, genetic 
and industrial experiments. 


2. Introduction. An incomplete block design with v treatments each replicated 
r times in b blocks of size k is said to be group divisible (GD) if the treatments 
can be divided into m groups, each with n treatments, so that the treatments 
belonging to the same group occur together in \; blocks and treatments belong- 
ing to different groups occur together in A, blocks. If 4; = Az = A (say) then 
every pair of treatments occurs together in \ blocks and the design reduces to 
the well known balanced incomplete block (BIB) design. 


It has been shown in [2] that the parameters v, b, r, k, m, n, A; and 2 satisfy 
the following relations and inequalities. 


(2.0) v = mn, bk = or 
33) Au(m — 1) + Awn(m — 1) = r(k — 1) 
(2.2) Q=r—-vry20, P = rk — vy. 2 O. 


The GD designs were divided into three classes: (a) Singular GD designs charac- 
terized by Q = 0, (b) Semi-regular GD designs characterized by Q > 0, P = 0, 
(c) Regular GD designs characterized by Q > 0, P > 0. The combinatorial 
properties of each class were separately studied. These will be referred to at 
appropriate places so far as they are relevant to the problem of construction of 
GD designs, which is the main concern of this paper. We shall confine ourselves 
to the practically useful range v = 10, r S 10, k <= 10, and choose \; and d, 
not to exceed 3, except for a few singular and semi-regular designs of special 
interest. 

As noted in [1] GD designs besides being a sub-class of PBIB designs [3], [4] 
with two associate classes, can also be regarded as a sub-class of inter- and intra- 
group balanced incomplete block (IIGBI) designs [5). 

Received 10/14/52. 

! This research was supported in part by the United States Air Force under Contract 
AF18(600)-83 monitored by the Office of Scientific Research. 

167 





168 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


3. Some types of balanced incomplete block (BIB) designs. 

(a) The construction of GD designs can in many instances be made to depend 
on known solutions for BIB designs [6], [7], [8], [9]. We shall here bring together 
certain results with a view to subsequent use. The parameters of a BIB design 
will be denoted by a starred letter in order to distinguish them from the parame- 
ters of GD designs. Thus the number of treatments will be denoted by v*, the 
number of blocks by b*, the number of replications of each treatment by r*, 
the number of treatments in each block by k*, and the number of times any two 
treatments occur together in a block by A*. The design is said to be resolvable 
[10] if the blocks can be grouped in such a way that each group contains a com- 
plete replication. 

(b) The simplest type of BIB design is the unreduced type with k = 2, the 
blocks of which are obtained by taking all possible pairs of t treatments. The 
parameters are 


(3.0) v*=t, b* =t(t-1)/2, r*=t—-1, k*=2, rA*=]1 


We shall later use the fact that when ¢ is even, the solution can be expressed 
in a resolvable form. For example, if t = 6, then we can write the 15 blocks as 


(1,4), (2,3), (0, &) 

(2,0), (3,4), (1, @) 

(3.1) (3,1), (4,0), (2, @) 
(4,2), ©,1), (, @) 

(0,3), (1,2), (4, &) 


where the treatments are 0, 1, 2, 3, 4 and «, and the three blocks in any par- 
ticular row of (3.1) give a complete replication. In the general case when t = 
2u the solution can be generated by developing the initial blocks 

(3.2) (1, 2u — 2),(2,2u —3),---,(w—1,u),(0, ©) mod (2u —1), 


the treatment © remaining unchanged. The designs (3.0) will be referred to as 
belonging to series (1). 
(c) BIB designs with parameters 


(3.3) vw=s, bt =s+s, r*=stl, k*=s, * =1 


may be said to belong to the orthogonal series 1 (0S1). They are also called 
balanced lattices [11], and can be obtained from a complete set of orthogonal 
Latin squares [6], [7]. They can, however, be more readily obtained by using 
certain difference sets [12] due to one of the authors, which have been given in 
Table I, and whose use is explained below. 

For example let s = 4. If we develop the difference set for s = 4, mod (s’ — 1), 
we get fifteen blocks of the BIB design 


(3.35) v* = 16, b* = 20, k* = 4, * = 1. 





INCOMPLETE BLOCK DESIGNS 


They are given by the columns of the scheme 


1 2 SS 7 &t PRU eeu SG 

(3.4) 3. 4 8 9 100 ll 122 134 0 1 2 
4 5 9 10 11 12 13 14 =O 1 2 8 

12 13 14 0 7 #4 & 6 £ @ @ Ww EE, 


The remaining blocks are obtained by starting with the block 0, s + 1,2(s + 1), 
co and deriving other blocks by adding 1, 2, --- , s to the treatments of this 
block, remembering that ~ is invariant under addition. Thus 5 other blocks are 
given by the columns of the scheme 


(3.45) 


TABLE I 
Difference sets for generating BIB designs belonging to the orthogonal series OS 1 


s Difference set Modulus 


mod (3) 

mod (8) 
12 mod (15) 
17, 20 mod (24) 
11, 31, 36, mod (48) 
14, 38, 48, 49, 5: mod (63) 
48, 49, 66, 72, 7: mod (80) 


2 
3 
4 
5 
7 


© 


The design is resolvable, the ith replication being obtained by taking the ith 
block from (3.45), and the ith and every succeeding (s + 1)st block from (3.4). 
We may thus rearrange the twenty blocks and get the design in the form, where 
the replications are separated by vertical lines, 


6 3 813 2 10 
gs) 38 5| . 5 510 0 7 12 
3. 9 611 112 , 13 
2 13 3 8 144 9@ 


(d) BIB designs with parameters 
(3.6) w=bt*=s8+84+1, r*=k*=s4+1, rA*=1 


may be said to belong to the orthogonal series 2 (OS 2). The solution for any 
design of 0S 2 can be obtained from the corresponding design of OS 1 by taking 





170 R. C. BOSE, 8. S. SHRIKHANDE AND K. N. BHATTACHARYA 


s + 1 new treatments, and by adding the ith new treatment to each block of the 
ith replication, and finally adding a new block containing all the new treatments. 
A solution is, however, more readily obtained by using the following difference 
sets due to Singer [13], which have been given in Table IT. 

Thus the blocks for the BIB design 


(3.65) ) 13, * = k* = 4, * = 1 


obtained by using the difference set corresponding to s = 3, are given by the 
columns of the scheme 


7 8 9 10 
8 9 10 Il 
10 11 12 O 
3 4 5 6 


(3.7) 


TABLE II 
Difference sets for generating BIB designs belonging to the orthogonal series OS 2 


8 Difference set | Modulus 


mod (7) 

mod (13) 

16 mod (21) 
12, 18 mod (31) 
32, 36, 43, 52 mod (57) 
15, 31, 36, 54, 63 mod (73) 

9, 27, 49, 56, 61, 77, 81 mod (91) 
, 20, 34, 38, 81, 88, 94, 104, 109 mod (133) 


- 
e.4 
w 


~ 
- 


SI Ct ke & bo 


Ww tb oO 


~ 


Qo 


- 


-- 
—— 

i ee 

- - 


www 


11 


. 


0 

0, 
0, 
0, 
0, 
0, 
0, 
0 


(e) BIB designs which are resolvable or (anc) for which \ = 1 are especially 
important for the construction of GD designs. We present in Table III designs 
of the type for which r* < 11, and which do not belong to the series u, OS 1 
or OS 2 already considered. The reference to the series is in the notation used 
in [8]. In each case the complete solution can be developed from certain initial 
blocks. 

Designs marked + are resolvable. For (2), (5) and (9) the initial blocks pro- 
vide a complete replication. Hence, in developing, the replications remain sepa- 
rate. For (6) the first seven blocks provide a complete replication and when 
developed yield replications I-VII. Each of the other three initial blocks when 
developed yields a complete replication. The solutions given here have been 
taken or adapted from [8], [14] and [15]. In developing the initial blocks the 
suffixes and © should be kept invariant. (For the use of binary symbols see 
Section 6.) 





INCOMPLETE BLOCK DESIGNS 


TABLE III 
BIB designs which are resolvable or (and) for which X = 1. 


+ ener = a0 Initial blocks Modulus 


T: 13 2% 631 439), 26,5) mod (13) 


15 35 , = -s (1;,2;,4 > (31,12,52), (6; ,22,32), mod (7) 
Cuts 62), (01,0, @) 


8 4 1 (00,01, 41,13), Caras 02) ued 6, 5) 


(17,11), (2,14,3), (4,9,6) mod 4 (19) 
9 4 (01;,02;,103,20;), (21;,12;,222,11.), mod (3, 3) 
(012,022,103,203), (212,122,223,113), 
(013,023,10;,20,), (213,123,22,,11,), 
Ps M, © ) 


3 1 (0,0. 03), (14,2141), (s,20,4s), a (7) 
(13,23,43), (31,52,63), (32,53,61), 
(33,51 ,62) Reps I-V II; (1),23,42) 
Rep VIII; (12,2:,4:) Rep IX; 
(13,22 vy) ~~ X 


5 1 (1,37,16,18 10), 8.9.52, 39) mod (41) 


5 1 (01,,02),104,.20s,00:), mod (3, 3) 
(21,,12;,223,113,002), 
(012,022,10,,20,,003) , 
(212,122,224,114,005), 
(01;,02;,10;,20;,00,), 
21;,12;,22,,115,00,), 
(01,4,02,,10;,20,,00;), 
(214,124,22),11;,005), 
(01;,02;,102,202,00;), 
(215,125,222,112,00;), 
(00; Pn ) 


(9)+ (B) | 8 4 7 43 “,1,2,4), (3,5,6,*) | mod (7) 


4. Construction of singular GD designs. It has been shown in [2] that if in a 
BIB design with parameters v*, b*, r*, k*, \* we replace each treatment by a 
group of 7 treatments, we get a singular GD design with parameters 
’ vy = ne, b = §*, r= k = nk*, 

(4.0) m= 0*, n= Nn, i = Pr, i = ». 

Conversely, every singular GD design is obtainable in this way from a cor- 

responding BIB design. The problem of constructing singular GD designs, there- 





172 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


fore, offers no difficulty. However, if r* and \* differ too much, then in the 
derived GD design, the accuracy of the within group and between group com- 
parisons will appreciably differ. We give in Table IV some cases of practical 
interest. 

A singular GD design may be considered to belong to the same series as the 
corresponding BIB design. The series has been shown along with the serial 
number in Table IV. It is clear that if a BIB design is resolvable the same is 
true of a GD design derived from it. Resolvability has been denoted by +. 

As an example consider design (11) of Table IV. The blocks of the correspond- 
ing BIB design are given by (3.7). Replacing each treatment 7 by two treatments 


TABLE IV 


Parameters of some singular GD designs, and the corresponding BIB designs from 
which they are derivable 


Parameters of BIB Parameters of corresponding 
Serial no. and series design singular GD design 
v* ee 2 x v r k n 


> 
_ 
© 


12 
16 
20 
14 
21 
10 
15 
20 
18 
27 

26 13 


6 
8 
10 
6 


oO 


(1) ut+ - > 3 
(2) ut ) 
(3) ut 

(4) OS 2 

(5) OS 2 

(6) 5 10 
(7) 5 10 
(8) 5 10 
(9) OS1+ 9 12 
(10) OS 1+ 9 12 
(11) OS2 13 13 


we 
oY bd be 
10 


Ww 
dodo wwe 


ne 
oo acces] 
to Wd & WD W lO Ore &W 


wm W OO bo 
ett 


w 


i; and iz, we see that the blocks of the GD design under consideration, are 
given by the columns of the scheme 


0; li 7 31 61 8: 9; 10; 12; 


1, 2) 32 42 BS, Gp To 8 9% 10, 25 
21 t 4, « 4 7 9; 10; ll, i 0; 
mm A 4, § 72. 82 QM 102 Ile 1 02 
4; q 6; 9, 10; 11, 12; 0; 21 
4s 5: Gy 7: Sp 9: 105. 1ly 125 Oy 2, 
10, 12, ? 3 4: hb fk 7 8: 
10. 12, 02 le : 32 42 «5 6. 7 82. 


The treatments 7; and 7, belong to the same group (¢ = 0, 1, --- , 12). They 
occur together in the same block four times. Two treatments not belonging to 
the same group occur together in a block just once. 





INCOMPLETE BLOCK DESIGNS 173 


5. Method of “‘omitting varieties” for the generation of GD designs. Consider 
a BIB design with parameters 


(5.0) b*, o, &*, A* = 1. 


A particular treatment @ occurs in r blocks. The remaining »* — 1 = r*(k* — 1) 
treatments can be divided into r* groups, each containing k* — 1 treatments, 
two treatments belonging to the same group if they occur together in the same 
block with 6. If we form a new design by omitting the treatment 6, and all the 
blocks containing it, we evidently get a GD design with parameters 


vev—1, b=b*—r*, r=rt—1, k= kt, 


(5.1) cs ,. n= k* — : A= O, A» = 1. 


THEOREM 1. By omitting a particular treatment 6 from a BIB design with param- 
eters (5.0), we obtain a GD design with parameters (5.1). Two treatments belong to 
the same group if they occur together in the same block as 8. 

In particular, if we start with a BIB design belonging to the orthogonal 
series OS 1, with parameters given by (3.3), we get a series of regular GD de- 
signs with parameters 


(5.2) »=b=s'—-1, r=k=8, m=st+1, n=s—1, 4=0, = 1. 


The method of obtaining the blocks of a design of OS 1 using the difference 
sets in Table I has already been explained. To get the corresponding design of 
(5.2), it is convenient to omit the treatment ©. Thus taking s = 4, the blocks of 
the GD design 


(5.3) v=b=15, r=k=4, f 3, = 1, » = 0 
are given by the columns of the scheme (3.4), and the groups are given by the 
columns of (3.45), if the last row containing only © is omitted. 


The BIB designs (1)-(8) of Table III may also be employed to generate cor- 
responding GD designs. For example the blocks of 


(5.4) v= 13, b* =26, r*=6, k*=3, A*=1 
obtained by developing the initial blocks given in Table III are 
: 67 8 910111202 
(5.5) é 89101112 0 126 
9101 i2 345 785 


v = 12, b = 20, 
m = 6, n = 2, 


are given by the columns of the scheme 


7 8 9101 
10 11 12 
3 4 





174 R. C. BOSE, S. 8. SHRIKHANDE AND K. N. BHATTACHARYA 


and the groups are given by the columns in 


(5.8) 


oe 3% c 4 
; @& 28 & 3. 


We give in Table V parameters of BIB designs with \ = 1, together with the 
parameters of GD designs derivable from them by omitting a variety. Designs of 
the orthogonal series 0S 2, and the semi-regular GD designs derivable from them 
have not been included, as the latter will be obtained in Section 7, as members 
of a more general class. 


TABLE V 
Parameters of BIB designs with = 1 not belonging to the series OS 2 and GD 
designs derivable from them by ‘‘omitting varieties” 


Parameters of BIB design Parameters of GD design 


Serial no. and series 
- oe rr He ) b , & wR n W\ 


(1) OS ) 20 
(2) OS 

(3) OS 

(4) OS 

(5) OS 

(6) Ty 3 

Cc) 2s is 35 
(8) Fy, 25 50 
(9) Te 19 57 
(10) F,. 28 63 

(ii) 7; 21 70 
(12) G, 41 82 10 
(13) G, 45 99 11 


s“Ioot ke 


“Io 


63 
8O 
12 


1 
1 
1 
1 
1 
1 
] 14 
l 
] 
] 
1 
] 
1 


wo oO 
© 


w 


24 
18 


ais 
27 


2WN te 


Oe 


CO He OO He ww 
Ww tw 


wo > 


20 
40 
44 


qo 


or 
t= we DO 


Gr 
ore 


6. Method of differences for generating GD designs. 

(a) The method of differences has been extensively used in [8] and [9} for the 
construction of BIB designs. We shall here adapt it to the construction of GD 
designs. Consider a module M with a finite number of elements. To each ele- 
ment let there correspond h treatments, the treatments corresponding to the 
element x being 


(6.0) is Se, 2? 5 Mes 


Thus there are v = gh treatments. Treatments denoted by symbols with the same 
lower suffix 7 may be said to belong to the ith class. 

Let x‘ and x5" be two different treatments of the ith and jth classes respec- 
tively, where 2’ and x‘"’ are elements of M. Let 





INCOMPLETE BLOCK DESIGNS 


(6.1) 2a — 2” = d, 2” — 


We then say that the pair of treatments z$” and x" give rise to the difference 
d of the type [z, j] and difference —d of the type [j, 7]. When i = j the differences 
are called “‘pure” and when i # j the differences are called ‘“‘mixed’’. The dif- 
ferences d of the type [7, j] and —d of type [j, 7] are said to be ““complementary” 
to one another. Thus every pair of treatments gives rise to a pair of comple- 
mentary differences, one difference corresponding to each order of writing the 
treatments. Clearly there are h different types of pure differences and h(h — 1) 
different types of mixed differences. Since every nonzero element of M can appear 
in a pure difference, and every element (zero or nonzero) in a mixed difference, 
the total number of different possible differences is 


(6.15) h(g — 1) + h(h — Dg = vv — 1)/g. 
If @ is an arbitrary element of M and 


(6.2) g'® aie z™ + 6, z” is zr” + 6, 


then the pair of treatments 2S” and z® give rise to the same pair of comple- 
mentary differences as x‘“’ and x”. Since 6 can take g different values, we get 
g pairs of treatments giving rise to differences d and —d of types [7, 7] and [j, 2] 
respectively, and it is easy to see that there are no other treatment pairs which 
give rise to the same differences. The v(v — 1)/2 treatment pairs thus give rise 
to just v(v — 1)/g differences, which checks with (6.15). 

Given an initial block B containing k treatments we can get g blocks by de- 
veloping it in the following manner. Let @ be any arbitrary element of 1/. Then 
we get a new block B, corresponding to 6 by replacing each treatment -; in 
the initial block by x; where 2’ = x + 6. By varying @ we get all the g required 
blocks. The initial block B gives rise to k(k — 1) differences namely, the dif- 
ferences which arise from the k(k — 1)/2 pairs of treatments which can be formed 
from the treatments in B. If any pair of treatments occurs in B, then all the g 
pairs of treatments which give rise to the same differences as the given pair, 
occur in the corresponding positions in the blocks developed from B. 

(b) THroreM 2. Let M be a module with m elements and to each element of M let 
there correspond n treatments. Let it be possible to find t initial blocks 


B,, Be, ara , By 


each containing k treatments, and an initial group G containing n treatments such 
that 

(i) the n(n — 1) differences arising from G are all different, and 

(ii) among the k(k — 1)t differences arising from the initial blocks each difference 
occurs \» times, except those which arise from G, each of which occurs d, times. 

Then by developing the initial blocks B,, B., --- , Bs we get the GD design 
with parameters v = mn, b = mt, r = kt/n, k, m, n, i, Ax, the group being 
obtained by developing the initial group G. 





176 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


Proor. Two treatments belong to the same group if and only if the differ- 
ences arising from them occur among those arising from G. By the conditions 
of the theorem and what has been said before any such pair will occur among 
the developed blocks A, times, and all other pairs will occur \, times. Also in the 
developed blocks each treatment must occur in (n — 1)\, + n(m — 1)dz pairs. 
But if this treatment occurs in r blocks then this number of pairs is also r(k — 1). 
Hence r must be the same for all treatments and is given by 


(6.25) (n — 1)Ay + n(m — 1)de = r(k — 1). 


Again the total number of pairs in all the developed blocks is mk(k — 1)t/2 and 
this must equal mnr(k — 1)/2 since each treatment occurs in r(k — 1) pairs. 
Hence r = kt/n. This completes the proof. 

In particular let M be the module of residue classes mod (m), and let the 
initial group G consist of treatments 


(6.3) 01,02, °*+ , On. 


Then to get a GD design with parameters v, b, r, k, m, n, 1, Ax we have to find 
t initial blocks such that among the k(k — 1)¢ differences arising from these 
blocks each pure difference and each nonzero mixed difference arises just A: times, 
and each zero mixed difference arises \, times. The designs (1)-(7) of Table VI 
have been obtained by using this special case of Theorem 2. For example for 
design (2) of Table VI, the complete set of blocks obtained by developing the 
given initial blocks mod (7), are given by the columns of the scheme 


1 2 4, 5: G6: O: ly 22 32 5e 
‘ 21 31 Di 6: O:; li 22 32 42 Se by 
(6.4) 4, 5: Qh nae he & & bs 
QO. le : 3. 4e 6. OF; li 2 4; 


The groups obtained by developing the initial group are given by the columns 
of the scheme 


0; 1, 21 31 4; 5a 01 


(6.5) ee ae ee 


(ec) The scope of the method of differences can be further extended by using 
the concept of “partial cycle’ (P.C.), (ef. [14]). We shall illustrate the use of 
this concept by considering a specific example. 

Let M be the module of residue classes mod (15), and to each element of M 
let there correspond a unique treatment. Consider the set of treatments 
(0,3,6,9,12). This set cannot form an initial group for the purposes of Theorem 
2, since the differences arising are not all different but are the elements 3,6,9, 
12 each repeated five times. We however note that if we develop this set, then 
the complete cycle of 15 sets consists of the three sets (0,3,6,9,12), (1,4,7,10,13) 
and (2,5,8,11,14) each repeated five times. We can therefore say that the complete 
cycle is divisible into 5 equal parts. If we take only a partial cycle, namely 14 of 





INCOMPLETE BLOCK DESIGNS 


TABLE VI 
GD designs which can be generated by the method of differences 


Parameters 
r 


Initial group Initial blocks Modulus 
n A 


6 3  (0;,0:) (11,61,02), (21,51,02), mod (7) 
0 1 (31,41,02), (12,22,42) 

4 4 (0,02) (11,21,41,02), (12,22,42,0;) mod (7) 
01 


8 4 | (0,,0:) (1,,31,9;,02), (21,61, 51,02), mod (13) 
01 (12,32,92,0:), (22,62,52,0;) 
9 3. (0,02) (0:,32,12), (01,42,02), mod (9) 
(01 ,52,82), 
2 3 (0;,62,72), (01,11,41), 
(01,21,23) 
10 4 (0; ,0-) (0;,2;,141,42), mod (15) 
(02,22,142,41), 
2 i (01,41,10;,12), 
(02,42,102,11), 
(0;,81,02,82) 
(0:,02,03) | (11,31,91,02,03), mod (13) 
(2;,61,51,02,03), 
| (12,32,92,03,0;), 
| (22,62,52,03,0:), 
(13,33,93,0:,02), 
(23,63,5s,0:,02) 
ee 
| (01,02) | (O1,12,22y42), (O2y11,21,41), mod (5) 
| (01,22,32,42), (O2,21,31,41) 
| (0,4,8,12) (0,1,10), (0,2,5) mod (16) 
| \% P.C. | 


| 


| (0,4,8,12,16,20) | (0,1,11), (0,2,7), (0,3,9) mod (24) 
| % P.C. 


(10) | (0,5,10) | (0,6,8), (0,11,14) mod (15) 
lig P.c. | 
| | 


(11) 


| (0,3,6,9,12) | (0,6,12), (0,3,4), (0,2,7) mod (15) 
% PC. 


(12) (0,6) | (0,1,4,6) mod (12) 
le PC. | 
(13) 12 36 9 3) (0,4,8) (0,1,3), (0,1,6), (0,2,5) mod (12) 
lg P.C. 





R. C. BOSE, 8S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


‘TABL 1E vVI— Cont. 


Seria} ‘ Sagenae 


db Initial group Initial blocks Modulus 
no. 
mn Mm he 


(144) 26 26 9 9 (0,13) | (0,1,2,8,11,18,20,22,23) | mod (26) 
13 2 03 w%PC. | 

(15) 35 70 10 (00,10,20,30,40,50,60) 
5 7 2 P. cS. 





ae — 
(10,20,40,01,04), | mod (7, 5) 
ered (10,20,40,02,03) 
ke aaa aici oe 
(16) | 33 33 (00,10,20,30,40,50, | 
31 : 60,70,80,90,t0) 
nas 





(10,40,50,90,30,01 02) mod (11, 3) 


Wy P.C. 


(17) | 15 30 8 (00,10,20,30,40) | (00,40,21,22), (00,20,11,12) mod (5, 3) 


So 8 & Fj GPL. 


(18) 15 30 10 5 (0,10,20) | (00,102 10,21,22,24), | mod (3, 5) 
5 3 23/4 PC. (00,10,21,22,23) 


(19) 24 60 10 4 (00, 30,60,90, | (00,10,40,91) c As mod (12, 2) 
s & 24 01 ,31,61,91) (00,20,50,31) C.C. 
¥g P.C. (00,60,01,61) 44 P.C. 


(20) 24 80 (00,20,40,60) | (00,10,61) C.C. mod (8, 3) 
6 4 4 PC. (00,50,71) C.C. 

(00,11,42) C.C. 

(00,01,02) 6 P.C. 


(21) | 12 30 10 4 (00,01,30,31) (00,20,30,11) C.C. mod (6, 2) 
$423;:KP.C. (00,10,50,41) ©.C. 
(00,20,01,21) 5 P.C. 





the complete cycle for our groups, we see that any two treatments, the differences 
arising from which are 3,12 or 6,9, occur together just once in a group. 
We now note that among the 18 differences arising from the initial blocks 


(6.6) (0,6,12), (0,3,4), (0,2,7) 


the elements 3,6,9,12 each occur twice, and the other nonzero elements, namely 
1,2,4,5,7,8,10,11,13,14 each occur once. If, therefore, we develop these initial 
blocks mod (15) we get design (11) of Table VI, the groups consisting of 4% of 
the complete cycle obtained by developing the initial group (0,3,6,9,12). This 
is denoted by writing 14 P.C. after (0,3,6,9,12) in the column 3 of Table VI. 

We may now state the following obvious generalization of Theorem 2. 

TrroreM 3. Let M be a module with cm elements and to each element of M let 
there correspond n/c treatments (c is supposed to be a divisor of n). Let it be possible 
to find t initial blocks each containing k treatments, and an initial group G contain- 
ing n treatments such that: 





INCOMPLETE BLOCK DESIGNS 179 


(i) The differences arising from G consist of n(n — 1)/c different differences each 
repeated c times, the complete cycle of G being divisible into c equal parts. 

(ii) Among the k(k — 1)t differences arising from the initial blocks each difference 
occurs dz times, except the n(n — 1)/c differences arising from G, each of which occurs 
A limes. 

Then by developing the initial blocks B,, Bz, --- , B, we get the GD design 
with parameters v = mn, b = met, r = ket/n, k, m,n, 1, 2, the groups being 
1/cth part of the complete cycle obtained by developing G. 

In particular let c = n, and let M be the module of residue classes mod (mn), 
one treatment corresponding to each element of M. Let G be 


(6.7) (0, m, 2m, --+ , m(n — 1)). 


Then the differences arising from G are the n — 1 elements m, 2m, --- , (n — 1)m 
each repeated n times. The complete cycle of G is divisible into n equal parts 
and we can get 1/n part of this cycle, by adding 0, 1, --- , m — 1 to the ele- 
ments of G and taking residues mod (mn). This gives us the m groups. If it is 
possible to find the initial blocks B, , B:, --- , B; each with k treatments, such 
that the differences arising from them consist of the elements m, 2m, --- , 
(n — 1)m each repeated \, times, and all other nonzero elements of M each re- 
peated d, times, then by developing B,, B:, --- , B; we get the blocks of the GD 
design with parameters v = mn, b = mnt, r = kt, m,n, dy , 2 . Designs (8)—(14) of 
Table VI have all been obtained in this manner. 

(d) In applying the method of differences, the use of systems of double modulus 
(u, v) is often advantageous. The elements of such a system are binary symbols 
xy, where z is a residue class mod (u) and y is a residue class mod (v). In adding 
two elements, we add the components separately and reduce the first component 
mod (u) and the second component mod (v). 

In applying Theorem 3, using systems of double modulus we shall take u = n, 
v = m, so that M is a system of double modulus (n, m). We shall illustrate by 
considering design (18) of Table VI, where m = 5, n = 3. The initial group G 
is (00,10,20), and consists of all elements of M for which the second component 
is zero. The complete cycle of G consists of 15 groups divisible into 3 equal parts. 
One of these parts is obtained by adding to G all the element of M for which the 
first component is zero. The groups of this “partial cycle” are taken as our groups 
They are given by the columns of 


00 O01 02 03 O4 
(6.8) 10 11 12 #13 #14 
20 21 22 23 24. 


The fact that the groups are obtained by taking only 14 of the complete 
cycle obtainable from G is denoted by writing 44 P.C. after (00,10,20) in column 
3 of Table VI. The differences arising from G are all the nonnull elements of 
M for which the second component is zero, each repeated 3 times. If we now note 





180 R. C. BOSE, S. 8S. SHRIKHANDE AND K. N. BHATTACHARYA 


that among the forty differences arising from the initial blocks (00,10,21,22,24), 
(00,10,21,22,23) the elements 10,20 of M each occur twice, and the other nonnull 
elements of M each occur thrice, it follows from Theorem 3 that on developing 
these initial blocks we shali obtain all the blocks of design (18) of Table VI. 
Designs (15)-(18) of Table VI have all been obtained in this manner. In design 
(16), ¢ stands for 10. 

(e) Finally instead of considering only complete cycles developed from initial 
blocks, we may also allow partial cycles. This will be illustrated by considering 
design (20) of Table VI. M is here a system of double modulus (8,3). The initial 
group G consists of n = 4 elements (00,20,40,60). The differences arising from G 
are the elements 20,40,60 each occurring four times. For our groups we there- 
fore take 14 part of the complete cycle obtained by developing G. Our blocks 
should be such that two treatments differing by +20 or +40 should not be in 
the same block, but any two treatments the difference of which is anything else 
should occur in a block just once. Now the differences arising from the initial 
blocks (00,10,61), (00,50,71), (00,11,42) are all the elements of M (occurring 
once) except 20,40,60,01,02. Hence by developing these initial blocks we would 
get all pairs of treatments occurring together except those which differ by +20, 
+40, +01. We can therefore complete the solution by adding the initial block 
(00,01,02) and taking 14 of the complete cycle obtainable from it, since the dif- 
ferences arising from it are 01 and 02 each repeated thrice. Designs (19) and (21) 
of Table V have also been obtained in a similar manner. The letters C.C. after 
an initial block mean that we have to take the complete cycle developed from it, 
whereas 1/n P.C. after an initial block means that only 1/n part of the complete 
cycle has to be taken. Of course this notation has been used only for those de- 
signs in which some of the initial blocks have partial cycles. 

It should be noted that Theorem 3 when properly interpreted remains valid 
even when some of the initial blocks have partial cycles. If 1/s part of the cycle 
arising from a block is taken, then this block counts only as 1/s blocks, and the 
differences arising from it count only as k(k — 1)/s differences (i.e., every set of s 
identical differences counts only as one). Thus in design (20) of Table VI, the 
number of initial blocks is t = 1% since only 4 of the cycle of the last initial 
block is taken. Since to each element there corresponds only one treatment 
c = n, the relation r = kct/n is seen to remain valid. The k(k — 1)¢ differences 
arising from the initial blocks are the 6 X 3 differences arising from the first three 
initial blocks, together with the two differences arising from the last initial 
block. 


7. Construction of semi-regular GD designs with 2; = 0. 

(a) For a semi-regular GD design P = rk — vd. = O by definition. Hence 
from (2.0) and (2.1) 
(7.0) r = don — Ai(n — 1). 


In this section we shall consider the case \; = 0. This leads to r = Agn, k = 
Hence the parameters of the design can be written as 


(7.1) v = mn, b= nd2, r= M7). k =m, m,n, ry = 0, rz. 





INCOMPLETE BLOCK DESIGNS 181 


We shall first establish the equivalence of the design (7.1) with an orthogonal 
array A = [Agn’, m, n, 2] of strength 2, which may be defined as a matrix A = 
(a;;), with m rows and gn’ columns for which each element a;; is one of the 
integers 0, 1, 2, --- ,m — 1, and which has the orthogonality property that for 
any two rows, say 7 and u, the pairs (a;;, a.;),j7 = 1,2, --- , Aen’ occurring in the 
corresponding columns consist of all possible ordered pairs of the integers 0, 1, 
2,-*:,n — 1, each repeated dz times. It follows that each of the integers 
0, 1,2, --- ,m — 1 appears nd: times in each row of A. Orthogonal arrays have 
been studied by Plackett and Burman, Rao, Bush and one of the authors (Bose), 
[16], [17], [18], [19], [20], [21], [22]. 

THEOREM 4. The existence of a semi-regular GD design with parameters (7.1) 
implies the existence of an orthogonal array A = [dgn”, m, n, 2] of strength 2, and 
conversely. 

Proor. Replace any integer x appearing in the ith row of A by the treatment 
(i — 1)n + z. The ith row of the derived scheme now contains the treatments 


(7.15) (@ — 1)n, @ —1)n4+1l,-*+,@—-ln+n-—-1. 


We shall show that the columns of the derived scheme give the blocks of the 
GD design (7.1), where the ith group of treatments is (7.15) Treatments belong- 
ing to the ith group occur only in the ith row of the derived scheme. Hence two 
treatments belonging to different groups never occur together in the same 
block (column). Also from the orthogonality property of A it follows that any 
two treatments belonging to different groups occur together in 2 blocks. This 
proves our statement. 

Conversely, suppose there exists a semi-regular GD design with parameters 
(7.1). Let the ith group of treatr>.:ts be given by (7.15), 7 = 1, 2, --- , m. It 
has been shown in [2] that each biock of a semi-regular GD design contains the 
same number of treatments from each group. Since k = m in the present case, 
each block contains just one treatment from each block. We can now exhibit 
the blocks of (7.1) as the columns of a rectangular scheme in which the treat- 
ments of the ith group occupy the 7th row. Replacing the treatment 


((—1)n+2 


of the ith group by z,z = 1,2,---,n—1,7 = 1,2, +++ , m. We then get an 
orthogonal array A of size qn’, m constraints, n levels and strength 2. This 
proves the equivalence of the orthogonal array A and the GD design (7.1). 

Coro.uary. The existence of GD design (7.1) implies the existence of the GD 
design with parameters 


2 
v = mn, b=, r=MN)3, k=m, 


(7.2) 
m,, n, A = 0, A» = 1 


where m, < m. 


If the GD design (7.1) is written in a form in which the columns give the blocks, 
and the treatments of the ith group appear only in the ith row, then to get the 
blocks of (7.2), we have simply to discard the last m — m, rows. 





182 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


(b) In special cases the blocks of GD designs with parameters (7.1) can be 
obtained more expeditiously by using affine resolvable BIB designs or finite 
geometries rather than by directly using orthogonal arrays. 

A resolvable BIB design is said to be affine resolvable if any two blocks of dif- 
ferent replications have exactly the same number of treatments in common. It 
has been shown by one of the authors (Bose) [10], that the necessary and suf- 
ficient condition for a resolvable BIB design to be affine resolvable is 


(7.25) b* = o* + r* — 1. 
In this case the number of treatments common to blocks of two different replica- 
tions is k* /y*, which must therefore be integral. The connection between 


orthogonal arrays and affine resolvable BIB designs was noticed by Plackett and 
Burman [19]. 


It is clear that if we dualize an atfiine resolvable BIB design with parameters 
v*, b*, r*, k*, \*, we get a semi-regular GD design with parameters 
r= i, k = r*, 
m Pe if, A = 0, Ae = k**/y*. 
In particular the BIB designs (3.3) belonging to the series OS 1 are affine re- 
solvable and lead by dualization to the blocks of the GD design 


(7.3) 


9 


v=s ts, b= ss, k=s+1, 


(7.35) 
m=s+l, n= 8, Ai = 0, Ae = 1. 


From this we can get the blocks for (cf. Theorem 4, Corollary) 


v = ms, = § k =m, 
(7.4) 
Ai =U, do = L 


where m < s +1. 


Tt will appear that we can express the blocks of (7.4) in a resolvable form. This 
will be illustrated by considering the special case s = 4. The columns of scheme 
(3.5) give the blocks of the BIB design v* = 16, b* = 20, r* = 5, k* = 4, \* = 1 
in a resolvable form. Let us write down the dual of this design. The blocks of the 
dual corresponding to the treatments of the original can now be numbered 
0, 1, 2,--+,14 and ». Also the treatments of the dual corresponding to the 
blocks of the original can be numbered 1, 2, --- , 20, and can be divided into 
five groups corresponding to the replications. If in the original (3.5), the treat- 
ment 7 occurs in the block j in the dual we put the treatment 7 in the block 7. 
The blocks of the dual are then given by the columns of the following scheme, 
where the last column corresponds to the block «. 

rR Rite as22v3s#4#ois s 

786866858 867668 T SS 7 

LL 2472 920 89 SRNMWUMwWMAH 8 iz 

13 15 15 16 13 14 13 13 16 14 15 14 14 16 15 16 

19 17 19 19 20 17 18 17 17 20 18 19 18 18 20 20. 





INCOMPLETE BLOCK DESIGNS 183 


Finally, we rearrange the blocks so that all blocks containing the same treat- 
ment of the last group come together, and arrive at the scheme 


Ss Ss. S's. S22 2 ae Se 
es: £2) &.e FS RSS ee 
(7.5) 11 9 12 10 9 10 12 11 #1112 910 
15 14 13 16 13 15 14 16 13 15 16 14 

17 17 17 17 18 18 18 18 19 19 19 19 


Taking only the first m rows of the scheme (7.5) the columns give the blocks 
of the semi-regular GD design 
v = 4m, b= 4, k =m, 

(7.55) 

m, n = 4, u = 0, 2 = 1 
when m < 5, the design is in a resolvable form the replications being separated 
by the vertical lines. 

(c) The connection between orthogonal arrays and finite geometries is given 
in [22]. We shall now illustrate the use of finite geometries in obtaining the 
blocks of semi-regular GD designs. 

Consider the finite projective geometry PG(3, p”), where p is a prime, and 
set s = p". There are exactly s° + s + 1 lines passing through any point O. 
Let us choose O = (0,0,0,1). Choose any m < s° + s + 1 lines through O, and 
let the points other than O on these lines correspond to the treatments. We 
then have ms treatments divided into m groups, the s treatments corresponding 
to points on the same line forming a group. There are s’ planes not passing 
through O. Each of these planes intersects a line through O in a unique point. Hence 
if we take these planes for blocks, then each block would contain exactly one 
treatment from each group. Also any treatment is contained in s’ blocks. Two 
treatments belonging to the same group do not occur together in any block, 
but the points corresponding to two treatments of different groups are joined 
by a line through which s of the planes chosen for blocks pass. Hence two treat- 
ments not belonging to the same group occur together in s blocks. We thus get 
a semi-regular GD design with parameters 
v = ms, b= s, r , C m, 

(7.6) 
m, n= 8, A 2 = 8; 


where m Ss +s+1. 


We shall now show that if m < s’, then the blocks can be obtained in a re- 
solvable form. Choose any plane through O, say x; = 0, and call it the fundamen- 
tal plane. There are s’ lines on the fundamental plane not passing through O. 
Through each of these lines there pass s planes chosen as blocks, which obviously 
give a complete replication provided that none of the m lines, the points of 
which (other than O) give the treatments, lie on the fundamental plane. Since 
there are s’ lines through O not lying on the fundamental plane, we can get the 
blocks of (7.6) in a resolvable form if m < s’. 





184 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


Again if s* < m S s° + s, we can divide the blocks into s sets of s’ each, such 
that the blocks of any set give s complete replications. This can be done by 
taking a fundamental line, say x. = 0, x; = 0. Let the lines whose points cor- 
respond to the treatments be different from the fundamental line. Then the s° 
blocks corresponding to planes passing through the same point of the funda- 
mental line give s complete replications. 

The equation of any plane not passing through O may be put in the form 
ax, + bre + cx; + 24 = 0 where a, b, c are elements of the Galois field GF(p"). 
Varying a, b, c we get all the s° planes. The s planes, for which a and b remain 
fixed but c takes the s different possible values, give a complete replication (when 
none of the lines, whose points correspond to the treatments, lie in z; = 0), 
and the s” planes, for which a remains fixed, but b and c take all possible values, 
give a set of s complete replications (when x2 = 0, 23; = 0 is not one of the lines 
whose points correspond to the treatments). After the blocks have been cal- 
culated the points representing the treatments may be identified with the 
treatments 1, 2,---, ms. 

Using PG(3, 2) we find that, if we retain only the first m rows of the scheme 
(7.7), then the columns represent the 8 blocks of the semi-regular GD design 
vy = 2m, b = 8, k =m, 

(7.65) 


Mm, n = 2, Ae = 2. 


The vertical lines separate the replications. 


1 
+ 
5 
8 


10 10 g : 10 10 
12 12 12 OL: 11 11 
13 ‘ 13 13 14 : 14 14 


The groups for (7.65) are given by the first m columns of 


(7.75) i247. 0p @ 
Pe 246 8 10 12 14. 


Similarly using PG(3, 3) we find that, if we retain only the first m rows of the 
scheme (7.83), then the columns represent. the 27 blocks of the semi-regular 
GD design 


v = 3m, 
(7.8) 
m 


, 





INCOMPLETE BLOCK DESIGNS 


As before the vertical lines separate the replications. 


1 2 3 2 
4 5 6 4 
7 8 9 9 
11)10 11 12)1 an 10/12 10 11/10 11 12111 12 10/12 10 1 
13 14 15/14 15 13,15 13 14/14 15 13/15 13 14|13 14 15\15 13 14/13 14 15/14 15 13 
16 17 18/17 18 16/18 16 1 17/18 16 17/16 17 18|17 18 16|17 18 16/18 16 17/16 17 18 
19 20 21/21 19 20/20 21 19/19 20 21/21 19 20/20 21 19/19 20 21/21 19 20)20 21 19 
22 23 24/24 22 2323 24 22/23 24 22/22 23 24/24 22 23/24 22 23/23 24 22/22 23 24 
25 26 27/27 25 26.26 27 25/27 25 26 26 27 25/25 26 27/26 27 25/25 26 27/27 25 26 
29628: 20 299010 whe 28 29m wo 100 00 whe me 209 9 I 90% 
31 31 3132 32 3233 33 33:32 32 3233 33 3331 31 31/33 33 3331 31 3132 32 32 
34 34 3436 36 3635 35 35 35)35 3 35 3534 34 3436 36 6 36/36 36 3635 35 3534 34 34 


37 37 3737 37 3737 37 3738 38 3838 38 3838 38 3839 39 3939 39 39 39 39 39 


The groups for (7.8) are given by the first m columns of 


1 4 7 10 13 16 19 22 25 28 31 34 37 
(7.86) 25 8 11 14 17 20 23 26 29 32 35 38 
3 6 9 12 15 18 21 24 27 30 33 36 39. 


(d) Whenever an orthogonal array fds’, m, n, 2] of strength 2 is directly 
available we can use it for obtaining the blocks of (7.1). The procedure to be 
followed has already been explained in the proof of Theorem 4. 

Using the array [18, 7, 3, 2] given in [22], we get the blocks of the semi-regular 
GD design 


= 3m, b 18, r= k m, 
mM, n= 3, A = 0, Ae = 2; 


where m S 7, by retaining only the first m rows of the scheme. 


12 3; 1 2 8 2 3 
5§ 6 4'6 4 5 5 6 4 
7 4 7 19 oF @ 
12 11 12 12 1 
15 1; 1 13 #138 14 s15 
17 18 16 16 17 


(7.9) 


20 20 20 21 21 


As before, the blocks are given by columns, and the vertical lines divide com- 
plete replications. Thus the design is resolvable for m < 6. The groups are given 
by the first m columns of 


1 4 7 10 13 16 19 


(7.94) 25 8 ll 14 17 20 
369 12 15 18 21. 





186 R. C. BOSE, S. S. SHRIHKANDE AND K. N. BHATTACHARYA 
Similarly using the array [32,9,4,2] given in [22] we can get the blocks of the 
semi-regular GD design 
v = 4m, b = 32, r 8, k =m, 
m, n = 4, Ai = 0, Ay = 2 


(7.96) 


ifm S 9. The design can be obtained in a resolvable form if m S 8. 
Plackett and Burman [19] have given orthogonal arrays [4A, 44 — 1, 2, 2] 
for all integral A < 25, except \ = 23. These may be used to obtain the blocks 


of the corresponding singular GD designs with parameters 


TABLE VII 
Parameters of semi-regular GD designs with \, = 0, Ax S 3,r S 10 


Serial Parameters Maximum Maximum m for 
no. ) r «= @& «@ m resolvability 


> 
° 


(1) mm 
(2) i mm 
(3) 25 5 mm 
(4) j > m m 
(5) mm 
(6) mm 
(7) mm 
(8) mm 
(9) mm 
(10) > mm 
(11) 32 mm 
(12) mm 
(13) 7 > m m 
(14) 3 9 m m 


RO tt 


| WW bo b bo 


b= r= { 
(7.98) 


m, n= 2, M = 


where m S 44 — 1. Of course only small values of \ and m yield designs of 
practical interest. 

We present in Table VII the parameters of semi-regular GD designs for which 
r S10, , = 0,» S 3, and the blocks for which can be obtained by the methods 
discussed in this section. The parameter m has been kept arbitrary, but the 
maximum value of m for which the design exists and also the maximum value 
of m for which the design can be obtained in a resolvable form has been given. 

Number (12) is the duplicate of number (3), that is, is obtained by repeating 
each block of (3) twice. Numbers (4) and (8) can be obtained by first writing 
down the orthogonal array [n’, 3, n, 2] corresponding to an n X m Latin square 





INCOMPLETE BLOCK DESIGNS 187 


n = 6, 10, as it is well known that a set of m — 2 mutually orthogonal n XK n 
Latin squares is equivalent to an orthogonal array |n’, m, n, 2], (ef. [18], (21}). 


8. Construction of semi-regular GD designs for which 2, ~ 0, 2. ~ 0. Now 
P = rk — vd, = O by definition, and k = cm since each block contains the 
same number of treatments from each group [2]. Using (2.0) and (2.1), the 


eight parameters of the design can be expressed in terms of m, n, 2 and ¢ only. 
Thus the parameters are 


(8.0) v=mn, b=nd/c, r=nd2/c, k= em, 
(8.1) m, nN, i = nlc — 1)dA2/(n — 1)e, re. 
Also as proved in [2] for a semi-regular GD design, 
(8.2) b2=v—m+: 
b—-1 nr -—c 
: a O gga. 
(8.3) “i —— am —1) 
TABLE VIIIA 
Parameters of semi-regular GD designs with \, # 0, r S 10 


Parameters 


Serial no. 
v b k m Ae 


Maximum m 





4m 12 ) 2m m 
(2) 3m 9 ) 2m m 
(3) 6m 20 3m m 


(1) 


The values of n, c and \». must be such as to make b, r and }, integral, but 
m may be any integer subject to (8.3). It follows that, if A; + 0, the only semi- 
regular GD designs in the range r < 10 are those listed in Table VIIIA. 

It is clear that, if the blocks and groups for the above designs can be ob- 
tained for the maximum value of m, then for any smaller value of m we have 
only to discard some of the groups and the treatments belonging to them. 
The groups and blocks for the designs in Table VIIIA are given in Table VIIIB 
(for the maximum value of m). 

Here the groups have been given in full, and only the blocks have to be de- 
veloped. The validity of the solution follows from the notion of differences de- 
veloped in [8] and explained in section 6(a) of the present paper. For illustration 
we shall consider design (3) of Tables VIITA and B, when m has the maximum 
value 3, and prove that the initial blocks shown give rise to it when developed. 

The 18 treatments form three groups shown in the 2nd column of Table 
VIIIB. The 15 treatments other than «©; , «2, ~; fall into three classes accord- 
ing to the suffix carried (cf. section 6(a)). We shall distinguish three different 
types of pairs. 








188 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 


(i) Pairs of the type (»;, ©,;);7 #j;7%,j = 1, 2, 3. Each of the three pairs 
(21, ©), (2, 23), (003, 0) occurs in just one initial block shown in the 3rd 
column of Table VIIIB. Since « and the suffixesremain invariant when the blocks 
are developed, each of these pairs occurs five times in the completed design, as 
it should since \, = 5 and ©;and ~; (i # j) belong to different groups. 

(ii) Pairs of the type (©, , w;);7,7 = 1, 2,3; where w is an element of the field 
of residue classes, mod (5). When developed, the pair (~;, u;) gives rise to 
five pairs, of which one component is © ; and the second component varies over 
all the five treatments of the jth class. In the initial blocks, © ; occurs with just 
4 treatments of the jth class, if i ¥ j, and 5 elements of the jth class, if 1 ¥ 7. 
It follows that any pair (», , w;) occurs 4 times in the completed design if 1 = 7 


TABLE VIIIB 
Blocks and groups for semi-regular GD designs with 5 # 0,r <= 10 








Serial no. Groups Initial blocks Modulus 

(1) (00;,01,,10,,11;) (00;,01,; 002,102; 003,113) mod (2, 2) 
(002,012,102,112) (00,,1 1;; 002,012; 002,103) 
(00;,013,103,115) (00,,10;; 002,112; 005,013) 

(2) (0;,11,2:) (01,11; 02,22; 03,23; 22, 3) mod (3) 
(02, 12,22) (01,21; O2,12; 03,23; 23, 21) 
(03,13,23) (01,21; 02,22; 03,13; 21, ©2) 
( oo ly oo 2) oo 3) 

(3) (01,11,21,31,41, © 1) (0;,11,21; 12,32,42; 03,13,23) mod (5) 
(02,12,22,32,42, oo 2) ( «© 1,91)413 00 2,02,22; 05,13,23) 
(03,13,23,33,43, 2 3) (00 101,213 02, 12,22; oo 3,03,23) 


(11,31,41; oo 2,32,423 «2 3,03,23) 


and 5 times if 7 ¥ j. This is as it should be, since © ; and u; do or do not belong 
to the same group according asi = j or? ~ j and \; = 4, Az = 5. 

(iii) Pairs of the type of (u;, u;); 7,7 = 1, 2,3, where wu is an element of the 
field of residue classes mod (5). It can be verified that leaving out ©,, 2, 
«©; the initial blocks give rise to each pure difference 4 times and each mixed 
difference 5 times. Hence in the completed design any pair (u;, u;) occurs 4 
times if ¢ = j and 5 times if 7 ¥ j, as it should, since wu; and wu; do or do not 
belong to the same group according as i = j ori # j. 

Again it is easy to see that each of the treatments ~,;, ©2, 3 occurs 10 
times in the completed design, since each of these occurs twice in the initial 
blocks. The other treatments also occur 10 times in the completed design, since 
each class is represented 10 times in the initial blocks. This completes the proof. 

If, in design (3) of Table VIIIA, m = 2, then the corresponding blocks can 
be obtained by developing the initial blocks shown in Table VIIIB, after drop- 





INCOMPLETE BLOCK DESIGNS 189 


ping the treatments with suffix 3. It should be noted that the first two initial 
blocks now give a complete replication, and the same is true of the last two 
initial blocks. Hence the blocks are obtained in a resolvable form. 


9. GD designs derivable by replication addition and subtraction. Consider a 
BIB design with parameters v*, b*, r*, k*, A* in which v* is divisible by k*, and 
suppose that either a resolvable solution is known, or at least a solution is known 
in a form where there are v*/k* blocks which give a complete replication. Then 
we can get a GD design with parameters 


(9.0) v = v*, b = th* + a(v*/k*), r= tr* + a, 
(9.1) m= o*/k*, n a”, 4 = A* + a, A. = Ar* 


in the following manner. Choose a set of v*/k* blocks giving a complete replica- 
tion. Repeat the BIB design ¢ times, and then add the chosen set of blocks a 
times. Then we get a GD design with parameters given by (9.0) and (9.1), for 
which the groups are given by the chosen set of blocks. 

When the BIB design is repeated ¢ times, the chosen set of blocks is also re- 
peated ¢ times. Hence instead of adding the chosen set of blocks a times, we could 
delete the chosen set of blocks a; times (a; < t). This would give a GD design 
with parameters (9.0) and (9.1) with a = —a,. If the original BIB design is 
resolvable, then the derived GD design is also resolvable. 

For example, if we start with the BIB designs of the series 0S 1 whose parame- 
eters are given by (3.3), we get resolvable GD designs with parameters 


(9.2) v=s', b = t(s’ + s) + as, r=t(s+1)+a, 


(9.3) m= 8, n=8 1 = t+ a4, 2 =! 


where a 2 — t, and s is a prime or a prime power. As an illustration let s = 4) 
t = 1,a = —1. The blocks of the BIB design v* = 16, b* = 20, r* = 5, k* = 4, 
\* = 1 are given in a resolvable form by (3.5). Hence the blocks of the GD 
design with parameters 


v = 16, 16, + 
m = 4, n= 4, M = 


are obtained by taking any four replications from (3.5); the remaining replica- 
tion then gives the groups. 

The blocks of BIB designs belonging to the series OS 1 can be obtained in a 
resolvable form as explained in Section 3, by using the difference sets in Table I. 
The blocks for all other BIB designs occurring in Table IX can be found in 
Table III, being in a resolvable form in every case except v* = 45, b* = 99, 
r* = 11, k* = 5, \* = 1. In this case the block (00, , 00. , 00; , 00, , 00;), when 
developed mod (3, 3), provides a complete replication. 





190 R. C. BOSE, S. S. SHRIKHANDE AND K. N. BHATTACHARYA 





10. Extension of GD designs. Suppose that there exists a resolvable group 
divisible design with parameters 


(10.0) v = ka, b = ra, , eh &%. My ees 
TABLE IX 
Parameters of GD designs derivable from BIB designs by replication addition or 
subtraction 
Auxiliary 


Parameters of BIB 


Serial no. desi param- Parameters of GD design 
and series — eters 
y* §* g* &* * t a v b rk mn aA. me 
(tf) GSi+)%36 2 5 41/0 =2'06 1G 444401 
2) OSi1+/|16 20 5441j}1 L'a 24 6@§4442 1 
(3) OSi+ |} 16 20 541)}1 Z2i36 BB ¢ 4442821 
(4) OS1+ 16 20 5 41 +2 —-2!16 32 8 444-0 2 
(5):0S1+ | 16 20 5 41:'2 —1:'16 36 9444421 2 
(6) OS1+ | 25 30 6 51/1 -1'23 26 555501 
(7) OS1+|25 30 65 1)1 Lime go ¢é665 5 2 1 
(8) OS1+/|25 30 65 1/1 2,25 40 855 5 3 1 
(99) OS1+ ,25 30 6 51:2 -2:'25 530 10 5 5 50 2 
(10) OS1+ 49 56 8 71,1 -—-1 49 49 77770441 
(11) OS1+ 49 56 871 1 Le @ 9ttiwtd?2 i 
(12) OS 1+ | 49 56 8711 2|\23 © 0 ¢ ¢ ¢ 8 I 
(13) OS 1+ | 64 72 9 8 1 1 -—1 64 6 8 8 8 8 O 1 
(14) OS1+ | 64 72 9 8 1 1 1,644 72 088 8 2 1 
(15) OS 1+ 81 90 10 9 1;1 -—1, 81 81 a, 379 & } 
(16) T,+ 15 35 Qa2 1)1 =2\/08 @ 6853 O 1 
(17) Ti+ ls go «28 11 l(b @ &€es6s 2 1 
(18) Tit |15 3 7 3S 1)}.1 2i%0 26 936 8 6 @ 1 
(19) Fet+ | 28 6 9 441;,;1 -—1,28 5 8474021 
(20) Fe+ 128 6 9 441;1 1 e@mwW4t 4 2 1 
(21) T;+ | 21 70 10 39 1);1 =-1\'21 @ 98 3S 723201 
22) Ge 4 99 11 5 1,1 -1,45 90 10 5 9 5 O 1 


so that the b blocks are divisible into r sets of a blocks, each set giving a com- 
plete replication. Let 


‘ize r=r-—a, a oe | 
k+1 
(10.1) 


r , , 
m=-, n=n, At = Ai, Ae = 1. 
n 





INCOMPLETE BLOCK DESIGNS 
Then clearly 
(10.2) vo = m'n’, b’k’ = o'r’, 
and it follows from (2.0) and (2.1) that 


(10.25) Ai(n’ — 1) + Agn'(m’ — 1) = r'(k’ — 1). 


Hence if b’ and m’ are integers, the parameters v’, b’, r’, k’, m’, n’, A, As given 
by (10.1) can be the parameters of a GD design. Suppose a combinatorial solu- 
tion of this design is available. We shall show that in this case we can build up a 
solution of the GD design with parameters 
ve =v+r’', b” =b+0, r” =f, k” =k+1, 

(10.3) ” ” 
=m+m, n” =n, M1 =A, Ae = 1. 


Let the treatments in (10.0) and (10.1) be different so that there are altogether 
v + v’ treatments. To each block in the ith replication of (10.0) adjoin the ith 
treatment of (10.1), (¢ = 1, 2,---, 17). To the design (10.0) so extended, add 
all the blocks of (10.1). This gives us a combinatorial solution of (10.3) where 
the groups are the groups of (10.0) and (10.1) taken together. It is easy to see 
that the necessary conditions are satisfied. This method may be called the 
method of extension. 
As an illustration we shall build up the solution of the GD design 
v= 12, b = T= 6, k = 

(10.4) 

m = 6, n = 2, A = 
starting from a solution of 

v = 6, b= r= 
(10.45) 

m = 3, n = 2, i = 
which can be obtained by adding one complete replication, say the last, to the 
solution (3.1) of the BIB design (3.0). Here a = 3, and we see from (10.1) that 
for extension we require a solution of 

y’ = 6, b’ = 6, f = Y = 3, 

(10.5) 


rj , 
m’ = 3, n' = 2. Ai = = |]. 


It is seen from Theorem 2 that a solution of this is obtainable by developing 
mod (6) the initial block (0,1,3). However to keep the treatments of (10.5) 
distinct from those of (10.45) we may replace the ith treatment of (10.5) by 
a;. Proceeding as explained, the blocks of (10.4) are given by the columns of 
the scheme 


L2aeesgseaeise Dea eseet#®@ 4 do @; G2 G3 a, as 
(10.6) 43 ~0 401020221032 © 3 2 mw a ae as a% as Ao 
Ag Gy Ag A, A, Ay Ap Az Az Az Ay Az Ay Ag Ag A; 5 425 G3 Ag Ay An A, Ae 


’ 





192 R. C. BOSE, S. 8S. SHRIKHANDE AND K. N. BHATTACHARYA 


and the groups are given by the columns of 
0 14 & a @ 
(10.65) 
3 2 © Gy G& a. 
Again we can build up the solution of the GD design 
v = 24, b = 54, r= 


m = 8, n = 3, M = 


(10.7) 


by starting with the design 
v= = C 3, 


m = 9, = = | 


(10.8) 


which is design (18) of Table IX, and use for extension the solution of 


vy’ = 9, b’ = 9, r’ = 4, ko = 4, 
(10.9) ; ‘ 
m' = 3, n’ = 3, Ai = 3, 2 = 1 


which can be obtained by developing mod (9) the initial block (0,1,3,6). 


11. Addition of GD designs. The method of addition consists of getting a new 
GD design by taking together the blocks of two suitable GD designs with the 
same v and k. It may be regarded as a slight generalization of the method of 
replication addition discussed in Section 9. This will be explained by two ex- 
amples. 

(a) If in (7.4) we put m = s — 1, we get the GD design 

v= = r= c=s—l, 
(11.0) 
m= = §&, AM = = |, 


a solution of which is available if s is a prime or a prime power. 

If we take the s blocks formed by taking all possible combinations of s — 1 
treatments from the ith group, we get an unreduced BIB design with param- 
eters 


(11.15) y* = §* = g, } s— 1, A* = 5s — 2. 


Repeating this for each group and taking together all the BIB designs so formed 
we get the GD design 


2 2 
v=s —S8, b=s —-s, r= k=s-1, 


(11.2) 
m=s—l1, Nn = 8, \ = Ae = 0. 


Taken by itself this is a disconnected design in the sense explained in [23] 
and [24], and any contrast between treatments of different groups is nones- 





INCOMPLETE BLOCK DESIGNS 193 


timable. But if we take together the blocks of (11.0) and (11.2) we get the GD 
design 


v =2—s, r=2-—-1, k=s-—1, 
(11.35) 


m= = 8, A = 8s — 2, A = 1. 


As an illustration we give below the blocks for the case s = 4 (Design (3) 
of Table X). 


5510 9 9 9 
6 6 11 11 10 10 
8 7 12 12 12 11. 


14 3 2 . 4 1 3 23 
(11.4) 85 6 fed i 6 8 67 
11 2 12 91 119 


5 

7 
8 
t 


The corresponding groups are given by the columns of 


9 


(11.45) - 


12. 


he scheme 


TABLE X 
Parameters of GD designs obtainable by extension and addition 


a Parameters 
Serial no. 5 


(1) > 
(2) F 54 
(3) 12 28 
(4) 20 45 
(5) : 32 
The first 16 blocks of (11.4) are obtained by taking the first three rows of 


(7.5), whereas the remaining 12 blocks are obtained by taking all combinations 
of three treatments from each group. 


By taking s = 5 in (11.35) we get design (4) of Table X. 
(b) Suppose we have solutions available for GD designs with parameters 


(11.5) v = mn, : nm, 1, Ae 
(11.6) v =mn/a, JU, y, & , ’ = m/a, 
where m’ and a are integers, and 

(11.65) Mi + AL = Ae + Az = Ai (Say). 


The m groups of (11.5) can be divided into @ sets each of m’ groups. With the 
v’ treatments occurring in any such set we can write down a solution for (11.6). 





194 R. C. BOSE, S. 8S. SHRIKHANDE AND K. N. BHATTACHARYA 


If we do this for each set and add the ab’ blocks so obtained to the blocks of 
(11.5) we get the solution of a GD design with parameters 


(7) v” = mn, b” = b + b’a, r=r+r’, 
7 

= a, n” = m’n, Ai, 2 
where the treatments occurring in a set now belong to the same group. Obviously 
every treatment occurs r + r’ times in the final design, but we have to show that 
any two treatments belonging to the same set occur together Aj times, and any 
two treatments belonging to different sets occur together 2 times. 

If two treatments belong to the same set, they either occur together in the 
same group or in different groups. In the first case they occur together in \, 
blocks obtained from (11.5) and in \j blocks obtained from (11.6). In the second 
case they occur together in 2 blocks obtained from (11.5) and 3 blocks ob- 
tained from (11.6). It follows from (11.65) that in either case they occur to- 
gether \{ times. 

Again if two treatments belong to different sets they will occur together in 
dz blocks obtained from (11.5) and in no blocks obtained from (11.6). This com- 
pletes the proof. 

As an illustration let us start with the GD design with parameters 
y = 12, b= = §, k = 3, 

(11.75) 
m = 6, n = 2, = 0, A. = 1, 


the blocks of which are given by the columns of (5.7) and the groups by (5.8). 

Let us take a = 2, and let the first three groups belong to the first set and the 
last three groups to the second set. Also as noted in Section 10 a solution of the 
GD design with parameters 


, 


v= = . a - = 3, 
(11.8) 


n' = = ae | 
is given by the last six columns of (10.6), and the groups by the last three col- 
umns of (10.65). We note that A: + Ai = Ae + Az = 2. Hence we can build up 
a solution of 
v 32, r= v 3, 
(11.85) 
m= = 6, MA = = 1 


by adding to the solution of (11.75) a solution of (11.8) twice over identifying 
Qo , @ , 22, 23, Gy, As, once with 5, 11, 2, 7, 6, 8 and next with 9, 10, 4, 12, 1,3 
respectively. Thus the 32 blocks of (11.85) are given by the 20 columns of the 
scheme (5.7) together with the twelve columns of the following scheme 


f 2 68 9 10 4 12 1 8 
(11.9) an oe oe ee oe oe oe oe: a ee oe 
5 8 ll 2 12 1 3 9 10 4. 





INCOMPLETE BLOCK DESIGNS 195 


The first group consists of the treatments 5, 11, 2, 7, 6, 8 and the second 
group consists of 9, 10, 4, 12, 1, 3. 


The parameters of GD designs obtainable by extension and addition are 
shown in Table X. 


REFERENCES 


[i] R. C. Bose anp T. Sarmamoro, ‘Classification and analysis of partially balanced in- 
complete block designs with two associate classes,’’ J. Amer. Stat. Assn., Vol. 
47 (1952), pp. 151-184. 
. Bosz anp W.S. Connor, ‘“‘Combinatorial properties of group divisible designs,”’ 
Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 
’. Bose anp K. R. Narr, ‘Partially balanced incomplete block designs,’”’ Sankhya, 
Vol. 4 (1939), pp. 337-372. 
.. R. Narr ano C. R. Rao, “‘A note on partially balanced incomplete block designs,” 
Science and Culture, Vol. 7 (1942), pp. 568-569. 
*. R. Rao, ‘General methods of analysis for incomplete block designs,’”’ J. Amer. 
Stat. Assn., Vol. 42 (1947), pp. 541-561. 
*. Yates, ‘‘Incomplete randomized blocks,’’ Ann. Eugenics, Vol. 7, (1936), pp. 121-140. 
. A. FisHER ano F. Yates, Statistical Tables, 3rd ed., Hafner Publishing Co., New 
York, 1948. 
.C. Bose, “On the construction of balanced incomplete block designs,’”? Ann. Eu- 
genics, Vol. 9 (1939), pp. 353-399. 
. C. Bose, ‘‘Some new series of balanced incomplete block designs,’’ Bull. Calcutta 
Math. Soc., Vol. 34 (1942), pp. 17-31. 
. C. Bosg, “A note, on the resolvability of balanced incomplete block designs,”’ 
Sankhya, Vol. 6, (1942), pp. 105-110. 
). G. Cocuran ano G. M. Cox, Experimental Designs, John Wiley and Sons, 1950. 
.C. Boss, ‘‘An affine analogue of Singer’s theorem,” /. Indian Math. Soc. (new series), 
Vol. 6 (1942), pp. 1-15. 
. SinGer, ‘A theorem in finite projective geometry and some applications to number 
theory,” Trans. Amer. Math. Soc., Vol. 43 (1938), pp. 377-385. 
*. R. Rao, ‘‘Difference sets and combinatorial arrangements derivable from finite 
geometries,’’ Proc. Nat. Inst. Sci. India, Vol. 12 (1946), pp. 123-135. 
. C. Boss, ‘On a resolvable series of balanced incomplete block designs,’’ Sankhya, 
Vol. 8 (1947), pp. 249-256. 
>. R. Rao, “Hypercubes of strength d leading to confounded designs in factorial ex- 
periments,’’ Bull. Calcutta Math. Soc., Vol. 38 (1946), pp. 67-78. 
. R. Rao, ‘Factorial experiments derivable from combinatorial arrangements of 
arrays,’’ J. Roy. Stat. Soc., Suppl. Vol. 9 (1947), pp. 128-139. 
. Rao, “On a class of arrangements,’’ Proc. Edinburgh Math. Soc., Vol. 8 (1949), 
pp. 119-125. 
PLACKETT AND J. P. Burman, “‘The design of optimum multifactorial experi- 
ments,’ Biometrika, Vol. 33 (1943-46), pp. 305-325. 
A. Busu, “‘A generalization of a theorem due to MacNeish,’’ Ann. Math. Stat., 
Vol. 23 (1952), pp. 293-295. 
Bus, “Orthogonal arrays of index unity,”’ Ann. Math. Stat., Vol. 23 (1952), 
pp. 426-434. 
‘. Bosr ano K. A. Busu, ‘Orthogonal arrays of strength two and three,’ Ann. 
Math. Stat., Vol. 23 (1952), pp. 508-524. 
‘. Bose, ‘‘The design of experiments: Presidential address to the section of Sta- 
tisties, 34th Indian Science Congress.’’ (1947). 
‘. Bose, “Least square aspects of the analysis of variance,’’ Institute of Statistics, 
University of North Carolina, Mimeo. Series 9. 





THE BASIC THEOREMS OF INFORMATION THEORY 
By Brockway McMILLAN 


Bell Telephone Laboratories 


Summary. This paper describes briefly the current mathematical models upon 
which communication theory is based, and presents in some detail an exposition 
and partial critique of C. E. Shannon’s treatment of one such model. It then 
presents a general limit theorem in the theory of discrete stochastic processes, 
suggested by a result of Shannon’s. 


1. General models of the communication problem. 

1.0. Introduction. For the purposes of this exposition, information theory 
is the body of statistical mathematics which has developed, largely over the 
last decade, out of efforts to understand and improve the communications art. 
We shall not attempt a history of this development, nor any detailed justification 
for its existence, since either of these efforts would take us further into the tech- 
nics of communication than is desirable in a short essay. 

It suffices to say here that this discipline has come specifically to the attention 
of mathematicians and mathematical statisticians almost exclusively through 
the book [1] of N. Wiener and the paper [2] of C. E. Shannon. 

In the remainder of this section we shall describe very broadly the kind of 
problem to which these two works are addressed. 

1.1. A simple model. The simplest mathematical model of the communication 
problem is like the problem of parameter estimation. A parameter @, usually 
ranging over a fairly abstract or at least multi-dimensional domain, represents 
the transmitted message. A variable, y, also fairly abstract in general, represents 
the received message. In realistic situations the received message is seldom a 
mathematically exact copy, or even an exactly predictable mutilation, of the 
original transmitted message. Hence, y is represented as a random variable whose 
distribution depends upon the parameter 6. The communication problem then 
is: given a sample of one value of y, to estimate the unknown @. 

There are two reasons why this model may not seem at first look to be a good 
one for the communication problem. One is merely that our most usual media 
of communication, direct acoustic transmission of voice and the written or printed 
word, are ones in which essentially exact transmission is possible and we are not 
aware of the underlying statistical nature of the problem. This is clearly a mat- 
ter of degree, however, and almost anyone can find in his own experience in- 
stances in which the statistical aspect of the problem was evident. 

Another apparent failing of this model is in fact real, and has led to refinement 
of the model. There are communication problems, mostly in technical fields, 
where it is realistic to assume that the recipient of y has no a priori knowledge 


Received 8/9/51. 
196 





INFORMATION THEORY 197 


about the parameter 6. The usual situation in human experience, however, is 
one in which there is a great deal of a priori knowledge about the possible values 
of 6. There are simple experiments with mutilated text, spoken or written, which 
will convince one that he can, and often does, exploit his own a priori knowledge 
of language, speaker, and subject matter to assist in deciphering what he reads 
and hears. A realistic model must include this possibility. 

1.2. Stochastic transmitted message. It was Wiener who first clearly pointed out 
that we may, and indeed often must, regard the transmitted message itself as a 
random variable drawn from a universe whose distribution function reflects our 
a priori knowledge of the situation. Cogent statements of this philosophy may 
be found both in [1] and [2]. This leads us to a model in which we have two 
abstract random variables, say z representing the transmitted message (replac- 
ing the parameter 6), and y the received message. There is then a joint distri- 
bution function for z and y which contains in it the complete mathematical 
description of the situation. One ordinarily thinks of this distribution function 
as being ‘‘factored”’ into an a priori distribution for z, representing the universe 
of possible messages, and a conditional distribution for y knowing z, representing 
for each z the universe of possible mutilations thereof. 

In this second and more important model, one can still regard the communica- 
tion problem as one of estimation: given the y value of a joint sample (z, y), 
to estimate the z value. This view is particularly appropriate in discussing the 
work of [1]. Here, the z and y are numerically valued time series and there is a 
natural numerical way to measure the deviation between the estimated and true 
values of x, namely, by the variance of estimate. 

The statistician may alternatively wish to regard the communication problem 
(in either model) as one of testing hypotheses. The observed y has a distribution 
depending on the hypothesis ‘‘z;’’ the problem is to decide which z is obtained 
at the time of observation. This view is more appropriate to the work of [2], 
wherein the time series are abstract valued, and no natural measure of the 
“‘wrongness” of an incorrectly adopted hypothesis is available. In the second 
model, the a priori distribution for hypotheses z eliminates one kind of testing 
error, so that in this model there is a simple criterion of performance, namely, 
the total probability in the (z, y) universe of all events (z, y) in which the hy- 
pothesis adopted is correct. The reader will observe this particular criterion in 
sections 6 and 8. 

The distinction between estimation, on the one hand, and testing among 
many hypotheses, on the other, is not sharp. We shall use “estimation” as a 
loose word to refer to the kind of model here set up for communication. 

1.3. Peculiarities of engineering applications. Information theory is distin- 
guished from a general study of models like these in two important respects. In 
the first place, as noted, the random quantities z and y of interest are, naturally, 
time series. Furthermore, the passage of time is explicitly recognized and the 
distinction between past events, which can be known, and future events which 
cannot be known, is carefully observed. 





198 BROCKWAY MCMILLAN 


In the second place, the kind of question considered in information theory, 
particularly by Shannon, reflects the peculiar interests of communication engi- 
neers. To illustrate this, we might go back to the jointly distributed abstract 
variables x and y of 1.2 above and the estimation problem there stated: given a 
sample of one value of y, to estimate the corresponding x. Typically, a practicing 
statistician facing this kind of problem will find himself confronted with a given 
joint distribution function for the variables, or at least committed to choosing 
one which he thinks is representative, and his attention is directed toward such 
questions as the following. 

a. By what criterion shall various estimates of x be compared? 

b. Given the criterion, what is the best estimate of x which can be made, and 
how good is it? 

c. How do competing methods of estimating x compare with the best? 

These questions of course appear in a communication context, too. The entire 
effort of [1] is concentrated in this general area. It often happens, however, that 
the communication engineer has a freedom that the statistician seldom has, that 
of controlling, at least in part, the joint distribution with which he must deal. 
How this comes about will be discussed in a moment. We can see at once, how- 
ever, that his interest in question (b) above will then extend to asking, in addi- 
tion, how he can optimize his best estimate over the additional freedom he has. 

1.4. The additional freedom. The additional freedom enjoyed by a communica- 
tion engineer is like the freedom granted the designer of an experiment. Typically, 
technology provides the engineer with a communicating device or medium; a 
random variable y whose range Y represents, as above, the events which can 
take place at the receiving point, and a probability distribution for y which 
depends upon a parameter 6. As above, the range 6 of @ represents the possible 
events at the transmitting point. In addition, one is given a quite separate 
random variable z whose range X is the universe of possible messages with a 
probability measure appropriate thereto. 

No relation is yet specified between the message z and the ‘‘stimulus” @ which 
is applied to the communication medium, and it is here that the extra freedom 
lies. Subject to limitations set by the necessary distinction between past and 
future, one is free to choose a mapping function f(x) from X into 6, @ = f(z). 
This corresponds to choosing some kind of encoding or modulation scheme trans- 
forming the original message into a form suitable for transmission. 

To illustrate the effect of this, suppose that the distribution function of y has 
a density p(@; y) with respect to some fixed underlying measure v in the y universe, 
and that the distribution of x has a density o(x) with respect to some underlying 
measure yz in the x universe. Then if one fixes the relation above between x and 
6, the function o(x)p(f(x); y) in X @ Y represents the density of the resulting 
joint distribution of x and y relative to the product measure u @ v. It is this joint 
distribution with which the communication engineer works. 

To the practising engineer, the most interesting theorems of Shannon’s 
paper relate to what can be achieved by varying the encoding process repre- 





INFORMATION THEORY 199 


sented by the function f(x). The strong theorems now known are all of an asymp- 
totic kind. 

1.5. Role of Fourier analysis. Even the casual reader will observe in [1], and 
in the latter part of [2], a preoccupation with Fourier analysis. It may be well 
to point out that this is a kind of accident; it happens that most practical com- 
munication media are governed by linear time-invariant differential equations. 
Hence, the first applications of information theory have been to systems which 
are naturally best handled by the tools of Fourier or Laplace analysis. 


2. Terminology and concepts. 

2.0. Limitation to discrete model. We shall confine our attention to the first 
part of Shannon’s paper [2]. This whole paper relates to the second model of the 
communication problem described above, with an emphasis on the kind of ques- 
tion discussed in 1.3. The first part of that paper is based on a fairly specific 
kind of model. The stochastic processes which it admits are all derived from Mar- 
kov processes having finitely many states. The auxiliary devices, encoders, etc., 
which are admitted are defined by similar constructions. We adopt the term 
“finitary” to denote a restriction to these classes of objects without at this point 
repeating Shannon’s definitions in detail. (There is a restriction, tacit in [2] but 
nowhere made explicitly, to devices whose graphs have the property that the 
terminal state of any transition is uniquely fixed when the initial state and the 
letter emitted are given. For the present, we take “‘finitary” to include this 
limitation.) 

The central concepts of [2] may be introduced well enough here by a glossary 
of terms. At this purely descriptive level, we may be quite general and admit 
things which are not finitary. 

2.1. Sample space and measurable sets. Let A be a finite set. We call such a set 
an alphabet and will have occasion to introduce further alphabets A; , B, etc. 
These are all abstract finite sets. An element of A will be called a letter of A, 
or simply a letter when no ambiguity results. 

Let J denote the set of integers: ] = (--- , —1,0, 1, 2,---). 

Given an alphabet A, denote by A’ the class of infinite sequences 


z= (+--+ 23,20, %1,2%2,°°°) 


where each x,¢ A, t¢ J]. Here x is an element of A’, and we call x, the letter of 
x at time t. 

A basic set (in A’) is a subset of A’ obtained by specifying 

(i) an integer n 2 1, 

(ii) a finite sequence ao , a1, *-* , @,—1 Of letters a, ¢ A. 

(iii) an integer t, —x~ <t< x. 

The basie set resulting from this specification consists of all sequences x ¢ A’ 
such that 


Osksn-1. 





200 BROCKWAY MCMILLAN 

Let F4 be the Borel field of subsets of A’ determined by the basic sets. 

.2. Glossary. Our glossary now reads: 
2.21. Information source. If u is a probability measure defined over the Borel 
field F' 4 , the ensemble or stochastic process [A’, F4 , u] is an information source. 
Since the space A’ is fixed by the alphabet, and the Borel field F, is always that 
determined by the basic sets, we can specify a source by the pair of symbols 
[A, wu). 

2.22. Stationary and ergodic sources. Consider a source [A, yu]. Let 7’ be the 
coordinate-shift transformation defined as follows. If x = (+--+ ,z1,2%0,%1, °°") 
then Tx = (+--+, 2/4, 2, 41, °°*), Where 2, = 241, te]. Then T preserves 
membership in F, (measurability). The source will be called stationary if (i) 
below holds, and ergodic if (i) and (ii) both hold. 

(i) If Se F,, then w(S) = w(TS). 

(ii) If S = TS, then either u(S) = 0 or u(S) = 1. 

2.23. Transducer. A transducer is characterized by two alphabets, A and B, 
and a function 7 from A’ to B’: given z ¢ A’, r(x) ¢ B’. A transducer differs from 
a general functional relationship in that it cannot anticipate. 

If x” ¢ A’ and x” ¢ A’ and b is an integer such that 


) 
» 


1) 2 
2? = 2 


then 


(2) 
= Fs 


where 
(i) (y+) . 
y =r(z), 1, 2. 


We can specify a transducer by the symbol [A, r, B]. 

2.24. Channel or communication channel. A channel is characterized by two 
alphabets A and B, and a list of probability measures v» defined over Fs , one 
for each 6 ¢ A’. Here we have used @ to denote the “‘parameter’’ in conformance 
with an earlier notation. 

Like a transducer, a channel cannot anticipate. That is, informally, if 


(1) gf? = of fort < bo, 
we must have 
(2) n(S) = »(S), 


where v,(S) denotes the value of vs(S) when 6 = 6°”, for any set S ¢ Fs which 
depends only on letters occuring before tf + 1. More precisely stated, (1) must 
imply (2) for any set Se F, such that “yf? = yf? for t < t, and y” ¢ 8” 
implies “y e S.” 

A transducer is a special case of a channel; it is a channel in which the received 
signal y is determined exactly by the transmitted signal 6. 

We can specify a channel by the symbol [A, v , B]. 





INFORMATION THEORY 201 


2.25. Stationarity. The concept of stationarity extends to channels and trans- 
ducers. It suffices to define a stationary channel, a stationary transducer is a 
special case. Referring to the definition of a channel, this channel will be called 
stationary if, for any Se Fs, ve(S) = vre(TS), where T is the coordinate-shift 
transformation. 

2.26. We have so worded the definitions above that all sources are “letter 
generators” producing one new letter for each unit of time, and channels and 
transducers accept and produce one letter for each unit of time. In a careful 
setting of the theory, one must account for the phenomena of compression and 
expansion which appear when languages are translated. For example, a long 
business message of fairly stereotyped form, when encoded for transmission by 
cable, may appear in a form having many fewer letters or words than the original. 
There are several ways of accommodating the mathematics to this situation, 
but these details are unimportant in a first look at the subject and will be ignored 
from here on. The fact of so ignoring them does not invalidate any theorem that 
will be stated. It merely leaves a gap between these theorems and certain useful 
interpretations of them. 


3. Entropy. 

3.0. Entropy. The terms defined in Section 2, suitably hedged, are the con- 
cepts with which [2] deals. (For purposes of exposition, we have defined channels 
and transducers quite differently from [2]. The disparity is largely but not en- 
tirely verbal. (Cf. 10.2.)) The principal tool for their quantitative study is the 
concept of entropy. 

Let pi, P2,°**, Pn bea finite and exhaustive list of probabilities: p; 2 0, 
l1<isn,pit+ po +--+: + pn = 1. The entropy of this list is defined to be 


H(pi, p2, ***, Pn) = —D. Pi log p; = Expectation (—log p). 
=1 


It is by now traditional to use logs to the base 2 in this definition, but the choice 
of base affects the value of H only by a constant factor. We shall use the base 2. 

3.1. Marginal entropies. To change the notation slightly, suppose that a and 
8 run over finite index sets (alphabets) A and B, and that p(a, 8) is the proba- 
bility of the joint event (a, 8). That is p(a, B) = 0, Doaca > senp(a, B) = 1. 
The entropy of this list of probabilities is denoted by H(a, 8): 


H(a, 8) = —D X p(a, B) log pla, 8). 


We can define also two marginal entropies 


H(a) = ~X 2 pla, &:) log (X pla, 8)), 
H(8) = —L X ples, 8) log (LX pla, 8)), 





202 BROCKWAY MCMILLAN 


and two average conditional entropies 
H4(a) H(a, B) = H(g), 
H,(8) = H(a, B) = H(a). 


3.2. Average conditional entropy. These latter are called average conditional 
entropies because of the following formula: fix 8 and consider the conditional 
probabilities for the various a¢ A. These are gs(a) = p(a, B)/ >> a,p(ar , B). 
The entropy of this list is 


he ga(a) log qa(a) = “ia x p(a, 8) log p(a, 8) 


(1) 


+ > pla, 8) log S plas, 8) 


+ r(8) a a2 


where r(8) is defined by (2) below. 
This expression is the entropy of the conditional distribution of a when it is 
known that a particular 8 has occurred. The a priori probability of this 8 is 


(2) X p(a;, 8) = (8). 


To average (1) over all 8, we multiply it by (2) and sum over 8. The result is 
seen to be Hg(a). This last entropy, then, is the average over all 8 of the entropies 
of the conditional distribution of a when 8 is known. 

3.3. Properties. Shannon [2] gives a fairly complete heuristic justification for 
regarding the entropy of a list of probabilities as a measure of one’s a priori 
uncertainty as to which of the possible events will actually occur in a given trial. 
In the course of this demonstration, he introduces the most important mathemati- 
eal properties of the H function. These are (i) its positivity, (ii) a kind of con- 
vexity property implied by the convexity of the function —-z log z, (iii) that 
composition law which permitted the identification above of the average value 
of (1) over 8, with the earlier defined F(a), and (iv) H = O if and only if there 
is exactly one event of nonzero probability. 

The convexity property (ii) mentioned above leads to the general inequality 
Hs(a) S H(a); that is, verbally, a condition (i.e. an a priori restriction on the 
“freedom of choice’’) never increases an entropy. This statement must however 
be taken only in the average sense in which it is stated: for any particular 8, 
the entropy of the conditional distribution of a bears no provable relation to the 


marginal entropy (a). It is only in the average over-all 8 that an inequality 
obtains. 


4. The entropy rate of a source. 

1.0. Definition. So far we have considered the entropy of a list of probabilities. 
The entropy rate of a stationary source [A, u] is most easily defined as follows. 
Given x ¢ A’, we use either of the bracket notations 





INFORMATION THEORY 


(1) [Te err, *°* » Lepn-a}, [t,¢ +m — 1; 2] 
to denote that basic set S ¢ A’ which consists of all x’ such that 
Ligh = Legn, OsheEn-1, 


The second notation will be used when it is to be emphasized that the basic set 
depends upon a particular infinite sequence z. 

The possible basic sets (1), as x ranges over A’, or, alternatively, as the 2,41, 
0 < h S n — 1, range independently over A, partition A’ into a“ measurable 
subsets, where a is the number of letters in the alphabet A. These subsets repre- 


sent all the possible sequences of n consecutive letters. They have the respective 
probabilities 


(2) u(([2xe , Uetqt ir ee Besaet). 


Our stationarity assumption makes this list of probabilities independent of ¢. 
There is then a unique number F, , independent of t, which is the entropy of this 
list (2) of probabilities. We shall show presently that the limit 

. ee 

(3) lim : Fs 
always exists. The value of this limit is defined to be the entropy rate of the 
source [A, y]. 

4.1. Interpretation. One cannot escape the heuristic meaning of this rate; one 
considers the possible long sequences of text as his universe of events, and evalu- 
ates the uncertainty F,, of the outcome of a trial. This uncertainty is then pro- 
rated among the n letters. These letters represent interdependent but possibly 
not determinately related elementary events whose concatenation generates the 
universe. The result, F,,/n , represents in the limit the average uncertainty per 
letter generated by the source. 

4.2. Defining F,, as an integral. We shall now prove the existence of the limit 
(3). The proof follows Shannon’s in a different notation. 

Given any x ¢ A’, the basic set (1) defined by that x contains x. The probability 
(2) then may be regarded as a step function of zx, equal for each z to the prob- 
ability of that basic set containing x which is specified by letter values at times 
t,t+1,---,¢+n — 1. Inthe same way, the definition 


(4) Te * log am He 


defines a nonnegative step function of x. One verifies at once from the definition 
of F,, that 


(5) ~F, - [ fal) dul). 


Regarding (2) and (4) as functions of z in this way permits us to phrase certain 
key problems in the language of integration theory. 





204 BROCKWAY MCMILLAN 


4.3. Another definition of H. Consider now the special conditional probabilities 


: i _ pit, Tass, "++, 24, Dol) os 
si Pe) ene ra 

Again we use the device of representing these as step functions of x. In words, 
pr(x) is the conditional probability of observing at time zero the letter zo of x, 
when it is known that the letters occurring at times = —n, —n +'1,---, —1 
are exactly those of x. 

Define 


go(x) = filx) 
9n(X) — —log Px(2), 


(7) 


Then g,(x) 2 0. 
One verifies by direct calculation from (6) and (7) that 


(8) G, = [ 9x) du(z) 


is the average conditional entropy of the next letter when n preceding letters are 
known. The inequality stated earlier, that adjoining a condition cannot increase 
an entropy, can be used to show that the G, form a monotone sequence: 


@2G,2G.2:-:-20. 
Therefore 


(9) lim G, =H 
certainly exists. The verbal interpretation of G, , the average conditional en- 
tropy of the next letter after a long segment of text is already known, suggests 
that the limit H in (9) is again the average uncertainty per letter generated by 
the source, that is, H is the entropy rate defined in (3). The proof in 4.4 below 
that this is indeed so, proves the existence of the limit (3). 

4.4. Identification of two definitions. By a direct calculation from the definitions 
it is found that 


(10) f(z) = vo o(T* x). 
4 k=0 


If one integrates this and uses the assumed stationarity of u, he obtains 


(11) Sy x +e + --- +d. 


N 
Therefore Fy/N represents the first Cesaro mean of a monotonely convergent 
sequence. It follows that the limit (3) exists and indeed is approached monotonely. 
A further consequence of (11) is that Fy/N 2 Gy 2 H. 





INFORMATION THEORY 205 


5. The capacity of a channel. 
5.0. Channel and source. We wish now to examine a stationary channel “driven” 
by a stationary source. Consider a source [A, yu], anda channel [A, ve , B]. Denote 


by C = A @ B the alphabet of pairs (a, 8), a¢ A, 8 ¢ B. Then C’ is the class of 
all infinite sequences 


(- ae (x1 ? y-1); (xo » Yo), (x1 ’ 1), si *) 


where x, ¢ A, y,€ B, t ¢ I. In an obvious way we can also regard C’ = A‘ @ B’, 
that is, as the class of paired sequences (x, y), z ¢ A’, ye B’. It is known that 
the Borel field Fe is determined by the sets X @® Y where XeF,, YeFs. 
We define a measure w for sets in F'¢ by the formula 


[, h(a, y) da(z, y) = [, du(x) a h(x, y) dve(y) 


valid for all positive measurable h(x, y). Here v, is the measure over F's which is 
induced by the channel when the input sequence is x ¢ A’. 

The stochastic process [C’, Fc , w] is now a source, which we denote by [C, w]. 

It is easily shown that if the original source [A, yu] and the channel are station- 
ary, then the source [C, w] is stationary. 

5.1. Marginal distributions. The source [C, w] represents the joint distribution 
of x and y, of input to and output from the channel. The source [A, u] represents 
the marginal distribution of the input. The marginal distribution of the output 
is represented by the source [B, n], where the measure 7 over F is defined by 


k(y) dn(y) = [, du(x) [ k(y) dv.(y) . 


This marginal source is staionary if [A, wu] and [A, ve , B] are. 

5.2. Causation. It is worth noting that the implication of causation in our 
language here, as we speak of a channel driven by a source, results from the fact 
that we consider the channel [A, ve ,B] as a pregiven thing, existing independently 
of any particular source [A, uJ]; this is the typical situation in the communica- 
tions art. Actually, the joint process [C, w] is a completely symmetrical concept, 
as to the roles of x and y, and one may consider, at will, the conditional prob- 
abilities v.(S), 2 ¢ A’, Se Fs, the conditional probabilities of y-events, knowing 
x, or the conditional probabilities, say, 7,(U), y € B',U « F,, of x-events, know- 
ing y. (Indeed, given the joint process, one will find that each of these condi- 
tional probabilities v, , respectively ~, , are measures for, respectively, almost 
all z(u), almost all y(n).) 

It happens that in most applications the v, are pregiven, and the j, derivative. 

5.3. Channel capacity. To use Shannon’s notation, let H(z, y) denote the en- 
tropy rate of the source [C, w], H(x) the entropy rate of the marginal source 
[A, uw], and H(y) that of the marginal source [B, ]. The quantity R = H(z) + 
H(y) — H(z, y) is defined to be the transmission rate achieved by the source 





206 BROCKWAY MCMILLAN 


[A, uw) over the channel [A, ve , B]. The supremum or least upper bound of these 
rates, as pw is allowed to vary, is defined to be the capacity of that channel. 

5.4. Interpretation. An intuitive interpretation of the rate H(x) + H(y) — 
H (x, y) can be obtained if we assume that the quantity H.(y) = H(z, y) — H(x) 
can be given the same verbal interpretation when the zx and y are stochastic 
processes that it was given earlier when the random quantities involved were 
drawn from finite populations. That it can, in the same limiting sense that the 
entropy concept has been carried over to stochastic processes, is easy to show. 
Foregoing this demonstration, we observe that R = H(y) — H,(y); that is, the 
rate of transmission F# is the marginal rate of the output, H(y), diminished by 
that amount of uncertainty at the output which arises from the average un- 
certainty of y even when z is known, that is, by H,(y), the average conditional 
entropy of y when x is known. In this verbal way, at least, R represents that 
portion of the ‘‘trandomness” or average uncertainty of each output letter which 


is not assignable to the randomness created by the channel itself. 

Another observation here is also pertinent. Because of the symmetry of R in 
x and y (which is more than a mere consequence of the notation!) we also have 
R = H(x) — H,(x). This shows R as the rate of the original source diminished by 
the average uncertainty as to the input x when the output y is known. 


6. The fundamental theorem. 

6.0. As justifying the theory. So far, we have introduced a list of what is 
hoped are natural-seeming concepts, and have stated a few mathematical results 
to help justify the rather picturesque language used in introducing them. The 
concepts themselves can only be justified as objects worthy of mathematical 
attention by the existence of theorems relating them. There is one such theorem, 
the so-called fundamental theorem for a noisy channel ({2], Theorem 11), which 
in itself performs this task completely. We shall quote this theorem and sketch 
its proof. This will complete our general exposition and lead us to our general 
limit theorem. 

6.1. Tae THEorEM. The fundamental theorem relates to this question. Sup- 
pose we are given a stationary channel with input alphabet A, and a stationary 
ergodic source with alphabet A, . We are permitted to insert a stationary trans- 
ducer [A, , r, A] between the source and channel, to create in effect, a new sta- 
tionary channel with input alphabet 4,. With this freedom, what is the opti- 
mum transmission rate which can be achieved between source and output? 

For the class of finitary sources, channels, and transducers, admitted in the 
model used in [2], this question is answered by Shannon’s theorem: Let the given 
channel have capacity C and the given source have rate H. Then if H < C, for 
any « > 0 there exists a transducer such that arate R > H — e can be achieved. 
If H = C, there exists similarly a transducer such that C 2 R > C — ¢. No 
rate greater than C can be achieved. 

Actually, Shannon’s proof of this theorem proves the following more com- 
plete result. 





INFORMATION THEORY 207 


THEOREM. Let the given channel have capacity C and the given source have rate H. 
If H < C, then, given any « > 0, there exists an integer n(e) and a transducer 
(depending on e) such that when n(e) consecutive received letters are known, the 
corresponding n transmitted letters can be identified correctly with probability at 
least 1 — e. If H > C no such transducer ezists. 

This statement is perhaps more satisfying to a statistician, in that the log- 
arithmic quantities H and C appear only in the hypotheses. The conclusion is 
then given in terms of the criterion of performance suggested in 1.2. 

6.2. Interpretation. In the vernacular, this theorem asserts that if a channel 
has adequate capacity C, an infinitesimal margin being mathematically adequate, 
then virtually perfect transmission of the material from thesource can beachieved, 
but not otherwise. Here, of course, we have used ‘‘virtually perfect’ to describe 
transmission at a rate 


(1) R=H-a2H-e. 


The sense in which this is to be interpreted as virtually perfect transmission is, 
of course, an asymptotic one and refers to the rate at which certain probabilities 
decay as the amount of available received text increases. 

Engineering experience has been that the presence in the channel of perturba- 
tions, noise, in the engineer’s language, always degrades the exactitude of trans- 
mission. Our verbal interpretation above leads us to expect that this need not 
always be the case; that perfect transmission can sometimes be achieved in spite 
of noise. This practical conclusion runs so counter to naive experience that it has 
been publicly challenged on occasion. What is overlooked by the challengers is, 
of course, that “‘perfect transmission” is here defined quantitatively in terms of 
the capabilities of the channel or medium, perfection can be possible only when 
transmission proceeds at a slow enough rate. When it is pointed out that merely 
by repeating each message sufficiently often one can achieve virtually perfect 
transmission at a very slow rate, the challenger usually withdraws. In doing so, 
however, he is again misled, for in most cases the device of repeating messages 
for accuracy does not by any means exploit the actual capacity of the channel. 

Historically, engineers have always faced the problem of bulk in their mes- 
sages, that is, the problem of transmitting rapidly or efficiently in order to make 
a given facility as useful as possible. The problem of noise has also plagued them, 
and in many contexts it was realized that some kind of exchange was possible, 
for example, noise could be eliminated by slower or less ‘‘efficient’’ transmission. 
Shannon’s theorem has given a general and precise statement of the asymptotic 
manner in which this exchange takes place. 

The statistician will recognize the exchange between bulk and noise as akin 
to the more or less general exchange between sample size and validity or sig- 
nificance. 


7. The asymptotic equipartition property. 
7.0. A Basic Lemma. The theorem quoted in 6.1 is termed fundamental in 





208 BROCKWAY MCMILLAN 


[2] because it answers a question which is clearly fundamental in the communica- 
tions art, and because it defines the applicability of the central concept of channel 
capacity. Many of the later results in [2] then concern the calculation of ca- 
pacities for practical or interesting channels. 

The proof of this fundamental theorem rests directly on a lemma (Theorem 3 
of [2]) which itself is a basic limit theorem in the theory of stochastic processes. 
As a mathematical theorem, this lemma requires very little of the specialized 
imagery of communication theory for its understanding. A mathematician, 
therefore, is likely to regard it as the more fundamental element. A generaliza- 
tion of it is the one contribution of the present paper. 

7.1. Shannon’s form. The basic limit theorem, as given in Theorem 3 of [2], 
asserts that the text from an ergodic finitary source possesses what we shall call 
an asymptotic equipartition property. The basic sets 


(1) [to , %1, er » Xn}, 


as x ranges over A’ describe a partition of A’, as we noted earlier: a partition into 
a” events, each one of which is the occurrence of a particular string of n letters. 
Shannon’s Theorem 3 asserts that, if H is the rate of a finitary ergodic source, 
then, given e > 0 and 6 > 0, there exists an mo(e, 6) such that, given any n = no, 
the basic sets (1) above can be divided into two classes: 

(i) a class whose union has u-measure less than e, 

(ii) a class each member E of which has a measure u(E) such that | H + 1/n 
log w(E)| < 6. 

That is, this theorem asserts the possibility of dividing the long segments of 
text from a finitary source into a class of roughly equally probable segments plus 
a residual class of small total probability. 

7.2. Stronger form. Let us introduce here the step functions f,(7) defined in 
(4) of Section 4: f,(2) = —1/n log p ([0, nm — 1; 2]). In terms of these, the pos- 
sibility of dividing the long segments of text into the categories (i) and (ii) above 
is easily seen to be equivalent to the assertion that the sequence (f,,(7)) converges 
in probability to the constant H. 

We shall say that a source [A, yu] has the asymptotic equipartition property, 
AEP, if the sequence (f,(x)) converges in probability to a constant. 

Shannon’s Theorem 3 then asserts that a finitary ergodic source has the AEP. 
We shall improve this in Section 9 to read as follows. 

THeorem. For any source [A, yp], the sequence (f,(x)) converges in L' mean (yu). 
If [A, ul) ts ergodic, and has rate H, this sequence converges in L' mean to the constant 


H. 


Since L' convergence here implies convergence in probability, (a fact easily 
proved,) we have the 

Coro.uary. Every ergodic source has the AEP. 

These are the limit theorems mentioned in the Summary. As we shall see in 
Section 8, they permit extending Shannon’s fundamental theorem, 6.1, to other 
than finitary sources. 





INFORMATION THEORY 209 


7.3. Interpretation. Returning to 7.1 and the description there of the AEP, 
we see that most of the probability must be accounted for by the aggregate (ii) 
of “likely” long sequences. That is, if the source has the AEP, there are, for 
large enough n, 2"” likely basic sets [zo , --- , Zn-1], roughly equally probable, 
accounting in the aggregate for all but a small fraction of the total probability. 

7.4. Another corollary. The proof in [2] of the fundamental theorem uses also a 
consequence of the AE property. We examine this consequence briefly. 

Consider a stationary source [A, yu] and a stationary channel [A, ve, B]. Sup- 
pose that both [A, yu], and the joint process [C, w] which results when this source 
drives the channel, have the AEP. We can write 


- ~ log w((0, n — 1; z]®@ [0, n — 1; y)) 
(2) 


_1 w((0,n — 1; z)® [0,n — l;y)) _ l = 
— log a? = ta = log u([0, n 1; z)). 


Our hypothesis that the joint process has the AEP now implies that the left 
member of this equation converges in measure to a constant, namely the entropy 
rate of the joint process, H(z, y). (Here the notation is misleading. In H(z, y), 
the z and y are labels merely. Equation (2) involves x and y as specific variables.) 
Also by hypothesis the second term on the right converges in probability to 
H(z), the entropy rate of [A, yu]. It follows then that the first term on the right 
converges in probability also to a constant, which constant must then be H,(y), 
by 5.4. 
Now the first term on the right of (2) is 


(3) — = log Bs.z (10, » — 1; y), 


where the argument of the logarithm is the conditional probability of [0, » — 1; y] 
knowing that [0, n — 1; z] has occurred. We have therefore proved the following. 

Coroutuary. Jf [A, u) and [C, w] have the AEP, then the functions (3) converge 
in probability to a constant. 


8. Proof of the fundamental theorem. 

8.0. Introduction. For simplicity, we do not examine the question of ergodicity 
and consider only the most interesting of the cases cited in the statement of the 
theorem (6.1), that in which we are given a finitary source [A; , u] of entropy rate 
H and a finitary channel [A, », B] of capacity C > H, both stationary. Our 
problem is then, given e > 0, to exhibit an n(e) and a finitary transducer [A, , 7, A} 
such that, when the given source drives the channel through this transducer it is 
possible at the receiver, given n(e) consecutive received letters, to identify the 
corresponding n(e) transmitted letters correctly with a probability exceeding 
1 — e. Here the probability is not conditional (i.e., not given the received letters) 
but in the universe of joint events at transmitter and receiver. 





210 BROCKWAY MCMILLAN 


We shall review Shannon’s argument. He does not supply detailed epsilontics 
here, and we shall not either. Generally, the manner in which they could be sup- 
plied is evident enough, though at one point we must consider a detail. (My 
efforts to make them simple have so far failed, however.) 

8.1. The ‘‘likely’’ events. The channel [A, v, B] has capacity C = H + 2y, 
say, where y > 0. There, therefore, exists a source, say [A, u*], which achieves 
over this channel a rate 


(1) R®e2C-y=H+y¥. 


We will use asterisks to denote quantities referring to this source. Let H* be the 
entropy rate of [A, u*], and let [C, w*] denote the joint process of input (7) and 
output (y) when [A, u*] drives the channel. For a simpler notation let K* = H} (2), 
the average conditional entropy of input to [C, w*] when output is known. Then 
by definition (5.3) 


(2) R* = H* — K*. 


We now invoke the AEP for the processes [A1, uJ, [A, u*], and [C, w*]. For 
large n there are roughly 2"” equally likely basic sets [0, n — 1; w] from [A,, yl], 
call these the likely outputs of [41 , uJ. Similarly there are roughly 2"”" equally 
likely basic sets [0, n — 1; x] from [A, u*], the likely outputs of [A, u*]. Further- 
more, consider the possible basic sets [0, n — 1; y] at the output of the channel. 
With the exception of an aggregate of these of small total probability in [C, w*], 
the conditional probabilities in [C, w*] of the (0, n — 1; x], knowing [0, n — 1; y], 


are such that roughly, there are 2““* equally likely [0, n — 1;.] for each [0,n — 1; 
y], call these the likely antecedents to [0, n — 1; y]. 

In each of these definitions the “likely” objects in sum exhaust most of the 
probability. In particular, the likely antecedents of [0, » — !, y] exhaust most 
of the a posteriori probability in [C, w*] of the basic sets [), n — 1; 2] when 
[0, n — 1; y] is known. Let us use the word “package” to mean “the aggregate 
of likely antecedents to a given [0, n — 1; y].” 

8.2. Marked basic sets. The nub of Shannon’s proof lies in the fact that the 
packages are so small that it is easy to find 2"” of them which are disjoint. In- 
deed, suppose one designates, ‘‘marks,”’ 2"” of the likely basic sets [0, » — 1; 2] 
from [.A, u*], doing soat random. Then the probability that a particular [0, n — 
1; .c] be marked in this process is 2"‘“~””’. Consider the 2"** likely antecedents of 
some [0, n — 1; y]. The conditional probability that two or more of these get 
marked, knowing that one of them is marked, is of the order of 


QnK* MH-H®) _ on(H-R*) <2°7°7 
by (1) and (2). This probability may be made small by choosing a large n. 

8.3. Distinguishability a posteriori of marked inputs. Conceptually, we now 
have this situation: some 2”” basic sets [0, n — 1; x] have been specially marked. 
Given a [0, n — 1; x], the received message, and knowing in addition that a 
marked basic set [0, n — 1; x] has been transmitted (has occurred) there is but 





INFORMATION THEORY 211 


a small conditional probability, in the joint universe of [C, w*] and of random 
markings, that either of the following events has occurred. 

(i) The actual (0, n — 1; x] which occurred is not a likely antecedent of [0, 
n— 1; y); 

(ii) The actual [0, » — 1; x] which occurred is a likely antecedent of [0, n — 
1; y], but there are other marked [0, n — 1; x] in the same package. 

There is now virtual certainty in the joint universe of [C, w*] and random mark- 
ings that the actual [0, » — 1; z] is a unique marked likely antecedent 
of (0, n — 1; y] when we know [0, n — 1; y] a priori, and that a marked [0, n — 1; 
x] is transmitted. That is, by making a marking at random, one is almost certain 
to have chosen a limited vocabulary of 2"” basic sets [0, n — 1; xz] which are 
almost certain to be distinguishable a posteriori, knowing [0, n — 1; y]. 

8.4. The transducer. The next step is deceptively simple. One shows easily 
that, given a marking, a finitary transducer can be described which maps the 
2”” likely [0, n — 1; w] from [A,, yu] on to the marked [0, n — 1; 2] from [A, »*). 
When one drives this transducer from [Ai , uJ], the likely output basic sets 
[0, n — 1; x] are just those which were marked. Therefore, when one operates 
the channel from [A;, u] through this transducer he has essentially only the 
vocabulary of marked basic sets appearing at the input to the channel. Let us 
call the resulting joint process of input z to, and output y from, the channel the 
source [C, w]. This source itself depends on the marking. 

If the probabilities sketched in 8.3 can be relied on for this new situation, it is 
evident that we have described a transducer, depending on a random marking, 
which, when [0, n — 1; y] is given, permits the correct identification of the 
[0, m — 1; x} which occurred in all but a set of cases of small probability (a 
posteriori, knowing [0, n — 1; y]) in the joint universe of random markings and 
events in [C, w]. We can assume that for all likely [0, n — 1; x] the input [0, 
n — 1; w] which produced it is unique. Then the average, over the joint universe 
of markings and events in [C, w], of the probability that the actual [0, n — 1; w] 
which occurred is not the one determined by this procedure is small. By the 
Tchebycheff inequality, then, all but a small fraction of the markings will describe 
transducers which make the probability of misidentifying the actual [0, n — 1; 
w] simultaneously small for all but a small fraction of the [0, n — 1; y]. 

8.5. Critique. This argument shows that it is somehow easy to describe a 
transducer which will make the probability of error small. There is, however, a 
gap in the argument. The probabilities calculated in 8.3 were based on [C, w*]. 
In 8.4 we used these as though they applied to any [C, w] which might arise when 
a marking had been made. If they are both ergodic, and this we are tacitly as- 
suming, w and w* are either identical or else each assigns unit probability to a 
null set of the other. (This is almost trivial to prove. To my knowledge it was 
first explicity noted by G. W. Brown.) The probabilities in 8.3 are based on rela- 
tions which hold only almost everywhere in [C, w*], and therefore, possibly, at 
most on a null set in [C, w]. This point is not touched on in [2}. 

In 10.1 we shall show that finitary channels have a kind of continuity which 





212 BROCKWAY MCMILLAN 


permits passage from [C, w*] to [C, w], when n is large enough, without serious 
modification of the probabilities. Shannon’s argument is then valid, though in- 
complete, for finitary channels. Indeed, it is valid for any channel having this 
kind of continuity, but I have not yet found a satisfying formulation of the 
property or isolation of the class. 


9. The limit theorem. 

9.0. Introduction. This section is devoted principally to the proof of the theorem 
quoted in 7.2, which has as a corollary that every ergodic source has the AEP. 
We recall the definitions of 2.1 and 2.2, and use the following notation. 

Given any fixed x ¢ A’, The symbols [t,t + n — 13; 2], [a:, --- , Ze4n-1] denote 
that basic set which consists of all 2’ ¢ A’ such that ri4, = tun, k =0,1,---, 
n-l1. 


Given a source [A, yu], the symbols [se du(x), [i du, denote integration 
over the space A’. Integration over a measurable subset S & A’ is denoted by 
one of [i@ du(z), [i du. 

8 8 


Following [3], we append ‘‘(u)”’ to a statement which holds almost everywhere 
with respect to u, or to a statement involving mean convergence relative to yu. 

9.1. The Theorem. Given the source [A, yu] we define the following step func- 
tions of ze A’. 


_ p([—n, 0; 2]) 
Pols) = Sl—a, —1; 2)’ 
polx = p([ro]); 
gn(x) = — log p,(z), 


fr(x) _ - log u(0, n — 1; 2)), > 


The function p,(x) is the conditional probability that the letter which occurs 
at time ¢ = 0 is 2 when it is known that the letters between time { = —n and 
t = —1 are also those of the infinite sequence x. The definitions of g,(r) and 
f,(x) need no comment. They are related by the important and easily verified 
formula 


N—1 
(2) ful) = ¥ > o(T*z). 
o k=1 


What is now to be proved is the 

THEOREM. For any source [A, ul], the sequence (f,(x)) converges in L' mean 
(u). If [A, wl ts ergodic, the limit of this sequence is almost everywhere constant and 
equal to H, the information rate of (A, uy). 





INFORMATION THEORY 213 


9.2. Proor. The proof of this theorem requires the following intermediate 
results. 

(i) The sequence (p,(x)) converges almost everywhere (x). 

(ii) Each g,(x) ¢ L'(u), and the sequence (g,(x)) converges in L' mean (y). 

These will be established in 9.3 and 9.4, respectively. Granted the second of 
them, the theorem to be proved follows easily, as we now show. 


We have g,(x) ¢ L', and lim, fi gn — g|du = O for some function g « L’. 


Then the mean ergodic theorem (e.g., [4], equation 2.42) implies that 
Dt g(Ttx)/N converge in Li mean to an invariant function A(z) = 
h(Tx). When uy is ergodic h(x) = H, a constant, almost everywhere. 

By (2) of 9.1 


[ite — rh des [|X toslra) = o(t* 2) | dua) 


+f | ye MP2) — We) | dul) 


2 Jl te) - of) | due) + f |Z a(t) — Wz) | dulo. 


In the second inequality we use the stationarity of » to obtain the first term. 
This term represents the first Cesaro mean of a sequence which by hypothesis 
has zero as a limit, hence it has also the limit zero. The second term also goes to 
zero as N — x, by the mean ergodic theorem. We conclude then that f,, — h 
in L' mean, and that f, — H in L' mean when uz is ergodic. We identify this 
constant H with the entropy rate of [A, yu] in 9.4. 

9.3. First Lemma. We now prove that the sequence (p,(x)) converges almost 
everywhere (xz). For any given set D ¢ F, define, in analogy with 9.1, (1) 


u([—n, —1; 2] N D) 


n(t, D) = 
P ) u([—n, —1; 2]) 


; a. 
this is the conditional probability of D knowing z_, , %#-n41, °*+ , 2-1. It is a re- 
sult of Doob [5] that such a sequence of conditional! probabilities is a martingale 
(positive and bounded) and converges almost everywhere. 

Given ae A, let D, denote the basic set of all x ¢ A’ with ao = a. Given any 
xe A’, the value of p,(x) is one of the numbers p,(r, D.) obtained as a ranges 
over the finite set A. Therefore 


(3) | pa(x) — pm(x) | S x | pa(z, Da) — pn(x, De) |, 


because the left member is, for each x, one of the summands on the right. 

Except for z in a certain null set (4), each term on the right of (3) converges 
to zero as m and n goto infinity, by the result of Doob quoted above. By (3), then, 
the sequence p,(x) converges almost everywhere (), say to p(x). 





214 BROCKWAY MCMILLAN 


It follows at once that the sequence g,(r) = —log p,(x) converges almost 
everywhere to g(x) = —log p(x), if we admit convergence to +. 

9.4. Second Lemma. We must now show that g,(x) ¢ L' and that the sequence 
gn(x) converges in mean (x). The integrability of the g,(x) is simple to establish 
directly, but will follow automatically from stronger results which are needed 
later. We need a uniform bound for the contribution of the “unbounded part” 


of g, to the value of / g, du. We shall therefore show that 


(4) [ 


uniformly in n, where A,,, is the set of x’s such that g,(2) = L. 
Let E,.x be the set of z’s where 


g, du < O(L2~") 
L 


n, 


(5) K S$ 9,(x) < K + 1. 
Let B denote a typical basic set [—n, —1; 2]. Given a ¢ A, let D. , as before, 
be the basic set of all x such that 2) = a. By its definition, g,(z) is constant over 


each BAD, , in fact, it has there the value — log[u(BAD.)/u(B)}. Hence g,(x) 
is measurable. 


Let a be the number of letters in the alphabet A. There are altogether finitely 
many, namely a"*’, sets BAD, covering A’. Since g,(x) = 0 everywhere, we 
have 


A'=U U BAE,. x. 


B K=0 


For fixed n, K, let D“ range over those D, such that BAD,.« + . Then the step 
character of g,(x) implies that BAF... = Up« BAD*. Therefore 


(6) / gn du = >| Qn Cu 
BAEnK DK /BApDk 


and, furthermore, over any BAD*, (5) holds. Therefore —log[u(BAD*) /u(B)| 
> K, or n(BAD*) s 2°*y(B). From (6), then, 


/ a. du < SS (K + D2-*u(B) < a(K + 1)27*yu(B). 
BAEnK DK 


We have then that 


(7) / g, au = | gn dup < (K+ 1)2 4 f 
En.K B JBAFnK 


since )>u(B) = 1. The right member of (7) is the Kth term of a convergent series 
and is independent of n. Since A,., = Ure. yx, (4) follows at once. 
That g, € L' follows by summing (7) over all K = 0. This summation gives a 


uniform bound, say 8, for [ @. du. Define g(x) = inf (g,(x), L), g(x) = inf(g(zx), 





INFORMATION THEORY 215 


L). Then lim, g%(z) = g(x), (u), and this convergence is dominated by the inte- 
grable function L. Hence 


(8) lim, [ | 95 — 9° | du = 0, 

and g” ¢ L'. Furthermore 

(9) [a du = lim, | gt du = lim sup, fo. du = 8B. 
By (9) and the definition of the left-hand side, 

[ oau = tim fo’ us 8 

whence g ¢ L'. Furthermore 

(10) tim [|g ~ o' \ du = tim, f @ - o') du = 0. 
We have now 


[ics -oldus fig —akldut [lok —o'\dut fi g¢—a\ au. 


The first term on the right is dominated by 


/ Jn du 
An L 


where A,,,;, is the set over which g,(z) 2 L. By (4) and (8) therefore, 


0 < lim sups | | gs —g| du < 0(L2™”) + [io —q| du. 


We let L — © and use (10) to conclude that lim, [ lg. —g\|du = O. 
This establishes the mean convergence of the sequence (g,(x)). 


It was shownin 4.3 that the entropy rate of [A, y] is lim... / gn du. From what 
wes have feet hente, Biticen | 7 | g du. In 9.2, h(x) is defined as the limit 
of 1/N >°*=} g(T*xr) and we know by the ergodic theorem then that fr du = 


Io du. When up is ergodic h(x) = H, a constant, almost everywhere. Therefore 


H= [Hdu= | hd 1 [a du =1im f gn du. 


This identifies H with the entropy rate of [A, ul. 





216 BROCKWAY MCMILLAN 


10. Finitary devices. 


10.0. Sources. Shannon’s Markov-like sources, which we have here called 
finitary, are defined by a construction equivalent (in a sense to be made precise 
later) to one now to be described. 

Consider a Markov process with finitely many states, enumerated 1,2, --- ,S. 
Let pi; be the probability of the transition from state j to state 7, and £; the 
stationary probability of occupancy of state 7, so that 


8s 
& = 2D Pik, lsiscS. 
= 


Let B be the alphabet whose letters are the symbols 1, 2, --- , S. We may 
suppose that the Markov process makes a transition at each time r = ¢ + }, 
t = 0,41, +2, --- . We define the stationary information source [B, v| by the 
rule that the letter which occurs at time ¢ ¢ J is the name of the state in which 
the Markov process is at that time. The p;; and the £; are enough to define this 
source. A source defined in this way will be called a finite Markov source. 

Let A be an arbitrary alphabet and let ¢ be a function from B to A: a = 
¢(8),8¢ B,ae A. Given y ¢ B’, say y = (--- ,y-1, 40,41, °**), We define 
z= (y)e A’ bya = (+--+ ,2-1,20,21, °*+), Where 2, = o(y,), te I. Let uw be 
the measure over F', defined by this construction. In the notation of [3], u = 
vb’. The source [A, u] we will call a projection of [B, v]. The notion of projection 
clearly applies even when [B, »v] is not Markov. 

An arbitrary source [A, u] will be called finitary if it is a projection of some 
finite Markov source [B, v]. Shannon’s sources are all of this kind in the sense 
that, given any source of his, there is a projection of a finite Markov source 
which produces the same ensemble of text, and conversely. 

Consider now an [A, uJ], a projection by ¢ of [B, v]. Given a ¢ A, let ¢° (a) 
denote that subset of B consisting of all 8 such that ¢(8) = a. We will call [A, y] 
unifilar if for each a ¢ A and each state 7 ¢ B there is at most one transition from 
i to ¢ (a) which has nonzero probability. 

The definition of [2], paragraph 7, and certain related results, are tacitly re- 
stricted to finitary and unifilar sources. The word “‘finitary” as we use it in dis- 
cussing the proof of the fundamental theorem (Section 8) may, however, be 
interpreted in the wider sense defined above: being a projection of a finite Markov 
process. 

10.1. Channels. We now frame a definition of “‘finitary channel” consonant 
with that just given for finitary sources. A finitary channel is specified by: 

(i) An input alphabet A. 

(ii) An output alphabet B. 

(iii) A finite set D = (1, 2,--- , K) of states. We treat D as an alphabet. 

(iv) A set of Markov transition matrices, || q;;(a) || , one matrix for each 
ae A. Each element q;;(a@) represents a conditional probability of transition from 
state j e D to state i ¢ D knowing that the input letter is a. We have =; q;;(a) = 1 
for each j and a. 





INFORMATION THEORY 217 


(v) A function y from D to B. The output letter from the channel is (7) 
whenever the transition is to the state 7 ¢ D. 

Consider a stationary source [A, uw] driving the channel so specified. Let A; 
be the stationary probability of finding the channel in state j e D, (if such a 
probability exists). Then the probability that the letter a be presented to the 
channel and that the channel make a transition to state 7 ¢ D is 
(1) i u([a])gia)r; . 

2 
Stationarity of the system requires now that the sum of these numbers over all 
aeéA bed; . That is, the vector (A; , 7 ¢ D) must be invariant under left multi- 
plication by the Markov matrix 
Qg= | 2 u([a]) qi;(a) || : 

At least one such invariant probability vector exists. If, for example, each 
matrix || q:;(a) || has a unique such invariant vector, then in general the \, will 
also be unique and they will be continuous functions of the letter frequencies of 
the source. 

Given the \; above, the joint probability that letters a; , --- , a, be presented 
to the channel and that the corresponding sequence of states of the channel be 
i, , 72, °** , tn iS Similar to the expression (1): 

(2) u([ay 2 Qn) : Vin in -1 (On) ar Mini; (2) qi, (arr; . 
27€D 


The joint probability of input letters [a,,---,a,] and output letters 
[31 , -*+ , Ba] is found by summing (2) for 


(1) all 4, in y™ (8), 
(2) all i, in y™ (4), 


(n) all 7, in y~™ (8,). 
The conditional probability of (8; , --- , Bn] knowing [a , --- , a] is then 


(3) Xu nore > 2 inin— (On) ee i, s(an)A; ’ 


where >, denotes the summation of 7, over y~ (8;). The expression (3) depends 
on [a,, --- , @n| and [3,, --- ,8,], and not otherwise upon past history. It is 
independent of the source except for the continuous dependence of the A; upon 
the letter frequencies. This continuity is sufficient for the proof in Section 8, 
since it is easy there to guarantee that the source [A, u*] and the source which 
results from putting [A, , u] through the transducer there defined have virtually 
the same letter frequencies. 





218 BROCKWAY MCMILLAN 


10.2. A discrepancy. The purist will observe that in 2.24 we defined a channel 
as a set of conditional probability measures ve over outputs, where @ represents 
the input sequence. The construction in 10.1 is not obviously of this kind, since 
the measures vg there obtained might well depend not only on @ but also on the 
particular source-ensemble in mind at the moment. We will not clarify the point 
here. Some pedagogic license was used in 2.24, and it is simpler to enlarge the 
notion of channel beyond that defined there than to try to reconcile the two 
definitions. 


11. A useful theorem. Let 2 be an abstract countable set of elements w. 
Let A be a finite set, an alphabet. Let u be a probability measure over a Borel 
field containing all sets S ® W, where Se F, and W ¢Q. Define the measures 
yu. over F, by wo(S) = u(S ® w)/u(A’ ® w). This definition is valid for almost 
every w. Define @ over F, by a(S) = u(S @ ). Suppose that the source [A, @] 
is ergodic and has rate H. Suppose that 


(1) [ (-tog u(4" © )] a ae 


Then the functions f,(2, w) = —(log uw ([0, n — 1; 2] ® w))/n converge in L! 
mean to H relative to u. Considering w as a pa’ meter, for almost every w the 
functions f,(7, w) converge in L' mean to H relative to yu, . 

ProorF. Since u((0,n — 1; 2] ® w) S a([0, n — 1; 2]) we have 


(2) falt, 6) 2 —= log a(l0, n — 1; 2) = gale), 


where the second equality sign defines g,(x). Fix n and consider the countable 
list of events [0, n — 1; 2] ® w. By the composition law (3.2), extended to in- 
finite sums, the entropy of this list of events is the sum of the entropy of the 
[0, n — 1; 2] and the conditional entropy of w knowing [0, n — 1; 2]: 


(3) H((0,n — 1; 2] ® w) = A((0, n — 1; 2}) + Hane). 


Now the convexity law (3.3) implies that the average conditional entropy 
H,,,(w) is always less than the unconditional entropy of w, which latter is the 
integral asserted to be finite in (1). Hence there is a finite K such that for all n 


(4) H,,.(w) SK. 


From (2), (3), (4) and the definitions of @ and the entropies, 


/ In{X, w) — gn(x) | du(z ® w) / frlx,w) du(x @ w) — / n(x) daa) 


= 1 7({0, n —1;24]@w - * H1((0, n—1;2]) < , K. 
n 7 


n 





INFORMATION THEORY 


Therefore 


[ite - Mids [ite onldut [los —Hidnst kK + [\o-H\ dp. 


Since by hypothesis and (9.1) g, tends to H in L' mean (jg), the first conclusion 
of the theorem follows. 


For the second conclusion of the theorem, we note that 
[ine w) — H \dy(x @w) = Do w(A’ @a) [i tee, w) — H | dy,(x). 


Since the left-hand side has limit zero, every term on the right for which p(A’ ® 
w) + O must have limit zero. 

As an application of this theorem let [B, v] be a stationary source. Let [A, fi] 
be a projection by ¢ of [B, v]. Let 2 coincide with the alphabet B and for S ¢ F, 
define » by u(S ® w) = v(@"(S)N D.) where D, is the set of y ¢ B’ such 
that y_. = w. The theorem then implies (if [A, @] is ergodic) that the rate of 
[A, u] may be calculated by considering conditional probabilities knowing that 
the letter w occurred at time —1. When [B, »] is finite and Markov, this often 
leads to simplified calculations. 

As another application, consider a fixed countable partition of A’ into sets 
S.. ¢ F, . Given an ergodic source [A, a], define n(S @ w) by u(S ® w) = a(S N 
S..). The theorem then implies that the entropy rate of [A, @] can be calculated 
using only partitions which refine the given one. 


REFERENCES 

[1] Norpert WIENER, Extrapolation, Interpolation, and Smoothing of Stationary Time Series, 

John Wiley and Sons, 1949. 
‘. I. SHannon, “A mathematical theory of communication,” Bell System Technical 

Journal, Vol. 27, pp. 379-423, pp. 623-656. 

[3] P. R. Hatmos, Measure Theory, D. Van Nostrand, 1950. 

[4] Norbert WIENER, “The ergodic theorem,” Duke Mathematical Journal, Vol. 5, pp. 1-18. 

[5] J. L. Doos, Stochastic Processes, John Wiley and Sons, 1953. 





ON A HEURISTIC METHOD OF TEST CONSTRUCTION AND ITS USE 
IN MULTIVARIATE ANALYSIS 


By S. N. Roy 
University of North Carolina 


1. Summary. In this paper two closely related heuristic principles of test con- 
struction (to be explained in Section 3), called Type I and Type IT methods, of 
which Type II is identified with the usual likelihood ratio method, are noticed 
as underlying most of the classical tests of hypotheses, simple or composite, on 
means of univariate normal populations, and on total or partial correlations or 
regressions in the case of multinormal variates. In these situations the two 
methods are found to lead to identical tests having properties which happen to 
be very good in certain cases and moderately good in others. For certain types of 
composite hypotheses an extension is then made of the Type I method which 
is applied to construct tests of three different classes of hypotheses on multi- 
normal populations (so as to cover, between them, a very large area of multi- 
variate analysis), yielding in each case a test in general different from the cor- 
responding and current likelihood ratio test. In each case, however, the two tests 
happen to come out identical for some degenerate ‘degrees of freedom.” In 
contrast to the likelihood ratio test it is found that in these cases, for general 
“degrees of freedom,” the corresponding Type I test is much easier to use on 
small samples, because of the relatively greater simplicity of the corresponding 
small sample distribution problem under the null hypothesis. In each case a 
lower bound of the power function of the Type I test is also given (against all 
relevant alternatives), anything like which, so far as the author is aware, would 
be far more difficult to obtain for the Type II tests in these situations. In this 
paper the general approach to the two methods is entirely of a heuristic nature 
except that, under fairly wide conditions, a lower bound to the power functions 
for each of the two types of tests is indicated to be readily available, which, how- 
ever, is much too crude or wide a bound in general. 


2. Notation and preliminaries. As far as possible observations and sample 
quantities will be noted by Roman letters and population parameters by Greek 
letters; scalars by small letters, matrices by capital letters, column vectors by 
small letters underscored, and row vectors by priming them; the determinant 
of a square matrix M by | M | ; “positive definite” by p.d.; ‘positive semidefi- 
nite” by p.s.d.; “except for a set of points of probability measure zero” or “‘al- 
most everywhere” by a.e. A(:p X q) will indicate that the matrix A is p X q, 
and I(p) will stand for a p X p unit matrix. :.N(é, 0°), x(:p * 1):N(E(:p X 1), 
X(:p X p)) and X(:p K n): N(E(ip XK 1), U(:p X p)) will indicate respectively 
that the scalar x is normally distributed about a mean £ with a variance o’, the 


220 





TEST CONSTRUCTION 221 


column vector z is multi-normally distributed about a mean vector £ with a 
(p.d.) covariance matrix 2, and the n column vectors of the p X n matrix X 
are independently and multinormally distributed, each column about a mean 
vector € with a (p.d.) covariance matrix 2. Exceptions to this notation will be 
clearly indicated at the proper places. For the sake of clarity, it may be noted 
that the X above has the probability density 


[1 | (2x) "| > |") exp [—3 tr S1(X — )(X’ — &’)], 


where é (:p X n) isa p X nm matrix each column of which is the column vector &. 
Furthermore, dz will stand for [] 2 dx; and dX for []x-; [] 2 dra . 

Throughout this paper all general discussions will be made in terms of the 
denumerable case, because I feel that perhaps the ideas are made clearest that 
way. The extension to the nondenumerable case might in general lead to measure 
theoretic difficulties, but such difficulties do not arise in the applications (most 
of them being nondenumerable cases) treated here. 

The most powerful critical region of size, say 8;(<1), which under fairly gen- 
eral conditions will exist and which under slightly less general conditions will 
also be unique), of a simple hypothesis Hy against a simple alternative H; (such 
that Hy, H; ¢ 2, where i = 1, 2, --- , and where Q stands for a domain of possi- 
bilities) will be denoted by w(H, , H; , 8;), its complement, the acceptance region, 
by w(H,, H;, 8;), to indicate that in general both will depend upon Hy, H; 
and 8;. The union of regions w(Hy), H;, 8;) over different H;, 8; or i(7 = 1, 
2,--- ) will be denoted by Ux,w(Ho, H;, 8:) or simply by Uw(H,, H;, 8:), 
and the intersection of regions @(Hy, H;, 8;) over different i(¢ = 1, 2,--- ) 
by N,@(Ho, H;, 8:) or simply by N;a(Ho, Hi, 8). P(Ho, Hi, 8;) will stand 
for the power of the most powerful test at level 8; or Ho against H; , and will 
in general depend upon all the three elements. It can be easily proved and has 
been published in an earlier paper [14] that P(Ho , H; , 8;) > 8; . For convenience 
a sketch is given here. Assume, for simplicity of discussion but without any 
essential loss of generality, that we have a set of n continuous stochastic variates, 
(in X 1) or simply z, with respective probability densities ¢z,(z) and ¢x,(z) 
(or simply ¢x”, and ¢y,) under the hypotheses Hy and H; . Then it is well known 
that w(H71, , H;, 8;) and &(Hy , H; , 8:) are given respectively by 


w(Ho, H; ; Bi)idbu; = Adah ’ 
o(Hy, H, ’ Bi)iobu; < Ada » 


(2.1) 


where A is determined by 
P(z ¢ w(Ho, H;, Bi) | Ho): 8; . 


Assume here that ¢ is such that w defined by (2.1) is unique. Integrating the 
first inequality of (2.1) over w(H, , H; , 8;) and the second one over @(H» , H; , 8,) 
we have respectively P(H,) , H;, 8:) 2 \8; and 1 — P(H,), H;, Bi) < AC — 8), 
from which, after a slight reduction, we have 





S. N. ROY 


P(Hy ’ Hi; ’ B:) > B; . 


Note that in general will be of the form A(H» , H;, 8;), depending on all the 
elements. Incidentally, any critical region of size 8 for Ho, whose power with 
respect to an alternative H is greater than or equal to 8, will be called an un- 
biased critical region for Hy against H. 

The likelihood ratio critical region at a level, say a, of Ho against the whole 
class H; ¢ 2, provided that it exists, will be denoted by &(H), a). As is well 
known it is given by 


(2.3) &(Hy , a):o(x) = w(Ho, abu, (x), 


where, for a given zx, ¢(2) stands for the largest ¢,(x) (provided that it exists) 
with respect to variation of H; over 2, and where u(H,, a) is given by 


(2.4) P(z ¢ &(Hy,a)| Ho = a. 


Notice that ¢(x) is a function of z only, being independent of H; , but may de- 
pend on the total domain 2. The power of this test, against any alternative H; 
will be denoted by P(H) , H;, a). 

Assume now that H) is a composite hypothesis and H,(i = 1, 2, --- ) a com- 
posite alternative. In earlier papers [8], [13], [14] the author gave a set of suffi- 
cient conditions on ¢x, for the availability of similar regions for Hy, and a set 
of (further) restrictions on ¢g, and $x, for the availability, among these similar 
regions, of one which is most powerful for Hp against H; in the following sense. 
Suppose H, and H, are composite hypotheses, each characterized by some speci- 
fied and some unspecified elements, so that, if the unspecified elements were 
specified, both Hp and H; would be simple hypotheses. Now suppose that, among 
the similar regions for Hy , there is one whose location in the sample space de- 
pends on the specified elements of Hy and possibly on those of H;, but not on 
the unspecified elements of Hy or H,; , but which is nevertheless the most power- 
ful critical region for any simple hypothesis within H) (obtained by specifying 
the unspecified elements) against any simple alternative within H, (obtained by 
specifying the unspecified elements). But this ‘‘most powerful” is “most power- 
ful among similar regions.”’ If we drop the restriction of similarity and set up 
in a straightforward manner the most powerful critical region for the simple 
hypothesis in question against the simple alternative in question, then we may 
get a (nonsimilar) region having a larger power than that of the most powerful 
similar critical region just referred to. Such a most powerful similar critical 
region may be conveniently called a bisimilar region for Hp against H; . The like- 
lihood ratio critical region for composite Ho against all composite H; ¢ 2 (which 
we know how to construct, provided that it exists), can be shown [13], [14] to 
be a similar region for Hy , under the restrictions just referred to. In this situa- 
tion the same notation will be used as introduced in the previous paragraph for 
the case of a simple hypothesis against simple alternatives, and the result (2.2) 
will also hold, it being noted that, while the regions will be independent of the 





TEST CONSTRUCTION 223 


unspecified elements in Hy and H;, P(Hy , H;, 8:) and P(Hy, H;, a) however, 
might depend on the unspecified elements of H; though not on those of Hp . 


3. Type I and Type II tests. 

3.1. Definitions and some remarks. Consider, for simplicity of discussion but 
without any essential loss of generality (for the definitions could be immediately 
carried over into the case of composite hypothesis and alternative) a simple 
hypothesis H, against a simple alternative H; such that My, Hii = 
1,2, --> ) €@. 

(i) Put 8; = 6(¢ = 1, 2,--- ) and set up as the rejection and acceptance re- 
gions for HypUw(H,, H;, 8) and its complement N,;a(Ho, H;, 8) to be called, 
respectively, U, and f;. This is defined to be a Type I test for Hy against the 
whole class H ; ¢ Q, the level of significance a being given by 


(3.1.1) P(z ¢ Uw(Ho, H;, 8) | Ho) = a(Ho, 8), (>8). 


Let us for the moment assume nontriviality, that is, that given a < 1, we can 
find 8 = B(Hy, a) > 0, for which (3.1.1) will hold. 

(ii) Put, in Section 2, AU), H;, 8:) = uw (a preassigned constant) for all 
i = 1, 2,--- , and rewrite w(Hy , H;, 8;) and a(H,, H;, 8;) as w*(Hy, H;, wu) 
and @*(Hy , H;, pu). 

Now set up, as the rejection and acceptance regions for Hy, Uww*(Hy , H; , ») 
and its complement M,a*(H», H;, 4), to be called, respectively, UT and NF, 
where the 8,’s (¢ = 1, 2, --- ) are subject to \(H,, H;, 8:) = mu (a preassigned 
constant). This is defined to be a Type II test for Hy against the whole class H; ¢Q 
the level of significance a* being given by 


(3.1.2) P(z ¢ U3(Ho, Hi, wu) | Ho) = a*(Ho, pz). 


Here again let us, for the moment, assume nontriviality, that is, that given 
a*(<1), we can find a uw such that 6(H), H;, uw) = 8:(>0) and that (3.1.2) will 
hold. This can be easily recognized as the likelihood ratio test by the following 
consideration. Notice that w*(/), H;, uw) (with a preassigned yw) is given by 


(3.1.3) w*(Ho 9 H; ° Bh) ibn, (2) = Ubu, (2). 


Any x would belong to Uw*(H,), H;, uw) if for that x, there were at least one 
H, ¢ Q for which (3.1.3) holds. It is easy to see that this would be accomplished 
if for that z the largest ¢y,(x) (under variation of H; over 2) were 2yudz,(Z). 
Hence it is obvious that 


U w* (Ho ’ H; 9 u)2(x) a Uda, (2) 
(3.1.4) 
No*(Hy, Hi, u(x) < udu, (2). 


3.2. An obvious property of the two types of test. Notice that U; includes all 


w(H,, H,, B) and UF all w*(Hy, H;, u). Now putting 


P(x ¢ U,| H) = P(U;, H;, a) and P(z e UP | H,) = P(U7, Hi, a) 





224 S. N. ROY 


we shall have from Sections (2) and (3) for the two types of tests 
(3.2.1) B(Ho, «) =B < P(Hy, H;, 8) Ss P(Ui, Hi, a) & P(Hy, Hi,a) Ss 
P(H., H;,a) >a 
8*(Ho, Hi, a) = BF < P*(Ho, Hi, ») 
< P(Ul,H;,a) S$ P(Hy, H;,«) <1 
P(H,, Hi, a) > a. 


(3.2.1) and (3.2.2) give respectively, for all H,; ¢ Q, the lower bounds P(H) , H; , 8) 
and P*(H,), H;, ») for P(U;, H;, «) and P(U?, H;, u), which, however, in 
general, would be far from close except sometimes for large ‘‘deviations” from 
H, . With more knowledge of the forms of ¢z, and ¢z, it is often possible to get 
far closer lower bounds; even the actual powers are often computable without 
much difficulty (and turn out to be pretty high) as for example in most of the 
classical tests on normal populations. 

It is easy to see that the results of (3.1) and (3.2) could be omit generalized 
to cover the case of composite Hy against composite H; ¢ 2 provided that we have 
similar regions for Ho and a bisimilar region for Hy against H; . This, therefore, 
need not be separately treated. 

3.3. Display of two classical tests as Type I tests. (i) Almost all classical tests 
on univariate and multivariate normal populations (ii) most classical tests on 
other types of populations and (iii) many tests on multivariate normal popu- 
lations proposed in recent years are known to be derivable (and indeed many 
of them have, in fact, been derived) from the “likelihood ratio’ principle, so 
that they belong to Type II. The author finds that all the customary tests in 
category (i), for example, the test of significance of (1) a mean, (2) a mean dif- 
ference, (3) total or partial or multiple correlation, and (4) regressions, (5) the 
F-test in analysis of variance, (6) the test of the hypothesis of equality of stand- 
ard deviations for two univariate normal populations, (7) the test based on 
Hotelling’s T, all belong to Type I as well. Those classical tests in category (ii) 
that the author has examined so far also all belong to Type I. Coming to those 
situations that are sought to be handled by tests proposed under category (iii), 
the author finds that the likelihood ratio tests offered so far, while they auto- 
matically belong to Type II, do not belong to Type I. On the other hand, if, 
in these situations, one carries out (see Section 5) the spirit and method of dis- 
criminant analysis, one gets tests (see Section 6) which belong to Type I in a 
sense slightly more general than we have indicated so far. 

In this section we consider, for illustration, two well known classical tests 
and show that they belong to Type I. 

(i) For N(&, o°) and N(&, o°) the classical test of H(t: = &) = Hp against 
H(& # &) = H at a level a is based on a critical region given by 


(3.3.1) ize or Ss -&, 





TEST CONSTRUCTION 
where 
t= (m + nm — 2)*{nyne/[ny + ne) }*x(#, — Z2)/{(m — 1)s; + (m — 1)s2}', 


and ¢ is given by P(¢ = & | Ho) = a/2 and where (#; , £2), (8; , 82) stand for the 
means and standard deviations of two random samples of sizes n, and nz drawn 
from N(é,, 0°) and N(&, 0°), respectively. This is well known as a likelihood 
ratio test but it is easily checked as Type I as well, in the following way. It is 
well known that ¢ 2 & is a one-sided uniformly most powerful (bisimilar) region 
of size a/2 for the composite H> against the composite H(i, > &) = H, and so 
also is t S —t for Ho against H(é& < &) = Hz; taking the union we have 
(3.3.1) of size a. 

(ii) Consider the testing of a general linear hypothesis in analysis of variance 
which, as is well known, can be formally reduced to the following. Suppose we 
have random samples of sizes n; , means Z; and standard deviations s; , drawn 
respectively from N(é;, o)(¢ = 1,---, k), and suppose we want to test 
AH(& = & = --- = &) = Ho against the whole class H of (&, --- , &) violating 
H,. Put n = ed n,;f= a n/n; §& = a nt,/n. Now the classical 
F-test for Hy , which is well known to be a likelihood ratio or Type ITI test has 
at a level a the critical region given by 


(3.3.2) F2 Fy 


where F = [D'-i nz; — 2)°/(k — I) + [ok (ns — 1)87/(n — &)] and 
where Fy is given by P(F 2 Fo | Ho) = a. 

To recognize this as a Type I test as well we proceed as follows. It was ob- 
served in earlier papers [8], [13] that among similar regions for Hy (which exist) 
there is a most powerful (bisimilar) region for Ho against any specific 
(f:, ++ , &) = §& violating Hy , the region of size, say, 8 being given by 


(3.3.3) t2h 


where t = Vn — 2 cot 6; 


k 4 s 
cos 6 = _ n(#; — %)(& — d/|z {n(z — #)* + (ni — 163 | 


i=l 


[2 nl& — a] 


and where é is given by P(t = t)| Ho) = 8. It was also noticed in those papers 
that this ¢ has exactly the usual (-distribution with (n — 2) degrees of freedom. 
Notice that & = &(n, 8) and B = B(n, &). To obtain now the union of regions: 
t > t over different sets of (,--- , &) we note that a given set of (observed) 
Z,’s and s,’s would belong to the union, if for that set there were at least one ¢ 
such that t = & . The union is thus easily checked to be given by: the largest ¢ 
(by varying over &,--- , &) 2 t& (which is fixed). But by (3.3.3) the largest ¢ 





226 S. N. ROY 


would correspond to the largest value of cos @, and, given Z,’s and s,’s, the largest 
value of cos @ (under variation over £ , --- , &) is easily seen to be given by: 


k ; k 4 
(3.3.4) cosé= [> n(t; — a /|z {(n, — si + ni(% — ai | ; 


1 


so that the largest ¢ is given by 


k } k 5 
(3.3.5) t = (n — 2)! bP nz; — | /|z (n; — pet]. 


t=1 i=1 
Therefore the union of regions: tf = t& , is given exactly by (3.3.2), which is the 
critical region of the F-test. Notice that given the a of the F-test, Fy is obtained 
from (3.3.2) in the form Fy(k — 1, n — k; a); and next by identifying the union 
of regions t = t& , with F = Fo we have 


h = [(k os 1)(n —_ 2)Fo/(n = k)}* aa to(Ie a 1, fe sak ke aii 
and next from (3.3.3) we have 
B = B(n, ty) = Blk it i. ia os ks a). 


3.4. Some further remarks on the two types of test. It may be noted (See Sections 
2 and 3) that by specializing the 8,’s (the sizes of the most powerful critical re- 
gious against different alternatives) in two special ways we get in a heuristic 
manner the two types of test. By specializing the 8,’s in other ways other heu- 
ristic principles could be set up, some of which, in special situations, might be 
“better” than the Type I or Type II tests. It has already been observed that in 
many situations Type I and Type II tests would coincide. This does not mean, 
however, that in those situations, 8(H,) , H;, a) of the Type II test would be 
the 8 of the Type I test. Given Hy and the H,,’s, it would be possible to find a 
8 for Type I and a uw for Type II such that the same er‘tical region for Hp against 
the whole class H; ¢ 2 could be looked upon as Uy ,w(Ho , H;, 8) in relation to 
the first type and also as Ug,w*(Hy , H;, uw) in relation to the second type. 

The following theoretical question or group of questions now under investi- 
gation is extremely important. Under what general restrictions on the probability 
law of z and on Hy and H; ¢ 2 would either or both of the tests be nontrivial 
(in the sense discussed in Section 3) and usable (in the sense of having a distribu- 
tion problem amenable to tabulation), and unbiased (against all relevant alter- 
natives) and/or admissible and/or reasonably powerful (in the sense of having 
not too bad a power against all relevant alternatives)? So far as the author is 
aware, these questions have not yet been adequately discussed in a general 
manner (let alone been answered) even for the likelihood ratio or Type II test 
(which has so long been extensively used in practice), and no attempt will be 
made in this paper to discuss these questions. The advantage, however, of hav- 
ing two such heuristic principles (with the possibility of having two different 
tests in many situations) is that it gives us more elbow room than we would 
have had with one such principle, in the matter of construction of nontrivial, 
usable and “‘pretty good” tests. 





TEST CONSTRUCTION 227 


4. Extended Type I test (and an obvious property of it). Consider a composite 
hypothesis Hy against a set of composite alternatives H; ¢ 2, (i = 1, 2, --- ). 
It often happens, as for example in the three broad situations discussed in Sec- 
tion 5, that, while there are similar regions for Hy), there is among these no 
most powerful (bisimilar) region for Hy against any H,(i = 1, 2, --- ), but that 
we have instead the following situation. Suppose we have composite hypothe- 
ses Hy(j = 1, 2,---) such that N;H); = Hy and composite alternatives 
H;,i = 1,2,--- 37 = 1, 2,--++ ) such that N,H,;; = H;. Notice that Hy; and 
H;; have more unspecified elements than Hy and H; respectively. It may well 
be that we have (as in the cases discussed in Section 5) not only similar regions 
for /,; but also, among these, a most powerful (bisimilar) region for Ho; against 
any H,; (one for each i with j = 1, 2, --- ; andi = 1, 2, --- ). Consider critical 
regions w(H»; , Hy; , 8) of size 8 each. Then by our test procedure, over 1,1; of 
&(Ho;, H,;, 8) (which we call f,;; for simplicity), we are anyway accepting 
NH, , that is, Hy and over its complement U;Uw(Ho; , Hi; , 8) we are rejecting 
at least one Hy; and therefore Hp itself. Suppose we set this up as a heuristic 
test for Ho against the whole class H; ¢ Q. Then the critical region will be 
UU w(Hy; , Hy; , 8) or U;; of size a, given by 


(4.1) P(x ¢ U;;| Ho) = @ 


so that a = a(Hy, 8) and 8 = 6(H) , a). As before, nontriviality will be assumed, 
and it is easy to check that we shall have for all 7 and 7 the following inequality: 


(4.2) B < P(Ho;, Hi;, 8) S P(U;;, Hi, a) & 1. 


It may he noted that while w(H;, H;;, 8), a bisimilar region of size 8 for Ho; 
against /,; , is independent of the unspecified elements of H/); and H,;; and while 
the location of U;; must be and its size might be (as indeed it is for all the cases 
considered in Section 5) independent of the unspecified elements of Ho; and 
H,;, P(Hy;, Hi;, 8), but might involve the unspecified elements of H,; and 
P(H, , H; , «) involve those of H; . As observed in Section 3, the lower bound to 
the power of the test, given by (4.2), while it is in general easily available, is, 
at the same time, much too crude. With more knowledge of the probability 


law a much closer lower bound can often be found as will be exemplified in later 
sections. 


5. Application to three multivariate problems. 

5.1. Statement of the problems. Three different types of hypotheses will be 
discussed here, namely, (i) the hypothesis of equality of covariance matrices of 
two p-variate normal populations, (ii) the hypothesis of equality of k means for 
each of p variates for k p-variate normal populations with the same covariance 
matrix (which is formally tied up with the general problem of testing a linear 
hypothesis), and (iii) the hypothesis that in a (p, + p2)-variate normal popula- 
tion the set of, say, the first p; variates is uncorrelated with the set of the last 
p2 variates. In symbols, using the notation given in Section 2, we can rewrite 





228 S. N. ROY 


these hypotheses as (i) H(2, = 22)(=Ho) against all H(2, # Y.)(=H), (ii) 
A(& = &&--- = &)(=Ho) (assuming a common *) against all H(#Hbo) (as- 
suming again a common 2%) and (iii) H(2.. = 0)(=Ho) against all H(Z» # 0) 
(=H), where the (p; + p») variates have a covariance matrix > of the follow- 
ing structure: 


« Pi 
(5.1) = D2 P2.- 


P2 

5.2. Direct Type I construction not possible. It is well known that there are 
infinitely many similar regions for each of the above composite hypotheses 
but no most powerful (bisimilar) region for H(2,; = 22) against any specific 
H( =, # 2X2) or for H(&, = --- = §&)(=Ho) against any specific H violating Ho 
or for H(y» = 0) against any specific H(2. # 0), so that direct Type I con- 
struction will not work here. 

5.3. Reduction to pseudo-univariate and pseudo-bivariate problems. At this 
point suppose that, starting from an z(: p X 1) which is N(&, =), we consider a 
linear compound of z, namely y’z (with an arbitrary constant, that is, nonsto- 
chastic y’(:1 X p) of nonzero modulus) which is a scalar well known to be 
N(u't, wD mu). Note that y’E and yw’ Dy are also scalars. Suppose further that we 


also start from 
s,s v 
(= ) 
ys v 
“12 —22 9 


zr g 
> (*) ” = N (2) : 
Ze} P2 § 


and consider linear compounds yiz; and y2t2 (where mi(?pi X 1) and go(:p2 X 1) 
are each nonnull); then these two scalars are well known to be distributed as a 
bivariate normal with a correlation coefficient 


— a ’ fon Con 
(5.3.1) pir , Wo) = pre = wrZrome/{(wrZrrmr)* (2 20m2)'). 
Now suppose that, in place of (i), (ii) and (iii) of 5.1, we consider respectively 


(iv) H(u'Ziw = w’Dou)(=Ho,) against all H(y'Siu ¥ u’Z2u)(=H,), (u fixed), 

(v) H(w'& = --- = w’t&)(=Hoy) against all H,(~ Hoy), (w fixed) and 

(vi) H(witeue = 0)(=Auy.) against all H(ui Siew: ~ 0)(=Ayyu.)(ur, we 
fixed). 


We now consider the éotality of all nonnull y for (iv) and (v) and al/ nonnull 
ui and ye for (vi). Notice that (i) NH (w’2az = w’Sau) = H(2i = 22), (ii) 
NA (wh: = +> = wb) = WE = --- = &) and (iii) Nye H(uitiwu: = 0) = 
H(2y = 0). We could have worked in terms of any subset of such y’s which 
leads by intersection to the same Hy, but this we do not do here. It may be 
noted that by the procedure to be used here, apart from measure-theoretic 
difficulties which, however, do not arise in these applications, the total set of 





TEST CONSTRUCTION 229 


n’s or any subset of it (of the kind considered) will uniquely define an extended 
Type I test associated with the total set or with that particular subset. Next suppose 
that, in the alternative, under (iv), (v) and (vi), we substitute “‘specific’”’ for 
“all” and thus have three new situations (vii), (viii) and (ix). It is well known 
that for each of the situations (vii), (viii) and (ix) we have one most powerful 
(bisimilar) region, so that from these we can construct respective Type I regions 
for the univariate situations (iv) and (v) and the bivariate situation (vi), and 
from these Type I tests we can try to construct the respective extended Type I 
tests for the situations (i), (ii) and (iii). This ties up (in the Section 4) the two 
p-variate problems (i) and (ii) with the two univariate problems (iv) and (v), 
and the (p; + pe)-variate problem with the bivariate problem (vi). 

5.4. A useful notation and reduction. For an observation matrix X(:p X n) 
with elements x(t = 1,2, ---,p;’ = 1,2,---, 7), les us put Z; =) ra/n, 
(i = 2,---,p) and 2’(= %,--- , #). Then the covariance matrix S(:p X p) 
will be given by (n — 1)S = XX’ — nazz’. Now suppose that in the situation 
(i) we have two observation matrices X,(:p X n,), p S n, — 1, two mean 
vectors z,:(p X 1) and two covariance matrices S,(:p X p) such that 
(n, — 1) S, = X,X1 — n,2,2) (r = 1, 2), so that S, is always at least p.s.d. In 
situation (ii) assume that we have k observation matrices X,(:p X n,); mean vec- 
tors z,(:p X 1);a grand mean vector z(:p X 1) such that z = =n,z,/n, where 
n= 2 il and p < n — k; a covariance matrix of means S*(:p X p) defined 
by (k — 1)S* = XX’ — nz 2’, where X(:p X k) = (V mini -- > Ving 2x); k co- 
variance matrices S,(:p X p) such that (n, — 1)S, = X,X} — n,z,2} ; a pooled 
covariance matrix S(:p X p) such that (n — k)S = > t(n, — 1)S, so that 
both S* and S must be always at least p.s.d. Finally in situation (iii) suppose 
that we have an observation matrix X and a mean vector z given by: 


Xi\ i 
5 Zi\ Pi 
x = Xe P2 and z= ( ’ 
Z2/ Pe 


v1 


and a covariance matrix S(:(pi + peo) X (pi + ps)) given by: 


Su Sw Pi oa a 
' XX, XiX2 Zi dk 
(n-—1)S = (n—1)\Su Sx/ pp = = (x1 22). 


> wy! > wl 
PoP. + X2X20 2 
Pi pe 


Here we observe that S must be always at least p.s.d. and also assume that 
~A S pand p, + pp Sn-—l. 

5.5 Type I tests for the situations (iv), (v) and (vi). 

(iv) Put Fy = u’Siw/g’Soy and notice that, at a level 8, for H(4’Ziw = wy’ Dex) 
(=Hp,) against all H(y’Sim > 4’ =o) we have the one-sided uniformly most 
powerful (bisimilar) region: F, 2 Fo, and for H(y’Zim = y’De4)(=Ho,) against 
all H(u'S\u < yu’ Zu) we have the one-sided uniformly most powerful region: 





230 S. N. ROY 


F, < Fo, where Fy and Fj are given by: P(F, 2 Fo| Hu) = P(Fy S Fo | Hy) = 
8. Notice that this F, has the ordinary F-distribution with (nm, — 1) and (nz: — 1) 
degrees of freedom. The Type I critical region will now be of size 28, being given 
by 


(5.5.1) wy:(Fy 2 Fo)U(F, S Fo), or :Fy2 Fo or Fo. 


For mn; = ne this will be an unbiased critical region, but, for ny ~ mm , this will 
be biased for certain small deviations and unbiased for all large deviations from 
the hypothesis. In any case, in this situation it is possible to construct a better 
(but slightly more difficult test) which will not be discussed here. 

(v) For H(y’&; = --- = y’&)(=Ho,) against any specific H,(~Hp,) there is 
the most. powerful (bisimilar) critical region (discussed in Section 3) (of size, 
say, y) which is a one-sided t-region, and by taking the union of these regions 
(for fixed y but by variations over , , --- , &), we have the Type I region given by 


(5.5.2) Pr, = uw’ S*u/u'Su 2 Fo, 


when Fo is obtained from P(F, 2 Fo | Hoy) = B. 

This is also well known to be a Type II or likelihood ratio test having in this 
situation various good properties (including unbiasedness and admissibility). 
Notice that this F, has the ordinary F-distribution with (k — 1) and (n — k) 
degrees of freedom. 

(vi) Put Tos, = wi Sroue (ut Siu)* (usSeoue)* and notice that, at a level 8, for 
H(ui2i2m2 = 0) (=A yyy.) against all H(uiryu2 > 0) we have the one-sided 
uniformly most. powerful (bisimilar) region: r,,,. 2 ro and for Ho,,., against all 
H(ui21u2 < 0) we have the one-sided uniformly most powerful (bisimilar) 
region Ty;y, S —1o, Where 7 is given by: 


(5.5.3) P(ruee = 70 | Howes) = 8. 


Notice that 7,,,, has the distribution of the ordinary total correlation coefficient 
on a sample of size n. The Type I critical region will be of size 28, being given by 


(5.5.4) Worus? (Tereg S 70) U(r yee S —7o), 


that is,|r| 2 ro|orr = re . 
This is well known to be also a Type II or likelihood ratio region having in this 
situation various good properties (including unbiasedness and admissibility). 

5.6. Actual construction of extended Type I tests for the situations (7), (ii) and 
(222). 

(i) By the test procedure (5.5.1), over Fo < F, < Fo we accept H(y’2in = 
uw’ 2) so that over NLP < Fy = w'Sin/u'Sou < Fo] we accept NA (aD = 
w’Xou) = H(2, = 2) = Ho, and thus over its complement U,[F, 2 Fo or S Fo] 
we reject Hy. This may thus be set up as the extended Type I test. To obtain 
U,(F, 2 Foor s Fo] we note that a particular set of observations, that is, a 
particular set of (S;, 8.) would belong to the union if for that (S,, S:) there were 
at least one w such that Fy = Fy or S Fo. It is thus easy to check that UF, 2 





TEST CONSTRUCTION 231 


F, or S Fo) is precisely equivalent to: the largest F » 2 Fo and/or the smallest 
FP. s Fo, the “largest” and the “smallest” being under variation of y (for a given 
set of S,, S:). Now, given (Si, S:), the largest and smallest value of yw’ Siu/y’ Sow 
are easily seen to be the largest and smallest roots, say 6, and 6,, of the p-th 
degree determinantal equation in @ 


(5.6.1) |S; — 0S2| = 0, 


all the p roots 6; , 62, --- , 6, being in this situation a.e. positive, since S, and 
S, are by the definitions and assumptions of subsection 5.4 of Section 5, a.e., 
p.d. (each of rank p). Starting out from the Type I test (5.5.1) for Ho, we have 
for H(Z, =  S.) the extended Type I critical region 


(5.6.2) 6, = Fo and/or  < Fo. 


To determine the size of this critical region, or more properly, given the size 
a, to find Fy and F¢ , we have to have the joint distribution of (0, 0, ---, 6,) 
on the null hypothesis H(2; = 2.) which was obtained in 1939 by a number of 
workers [3], [4], [7], [10] and which was found to be independent of the common 
value of =; = >, and also of & and & , that is, of all nuisance parameters. Start- 
ing from the joint distribution of (@,,--- , 6,) on the null hypothesis, we can 
obtain, by a technique given in earlier papers [9], [12], the joint distribution of 
(6, , 8,), from which Fy and Fo will be available, in terms of a, by using 


P(@, => Fo| 3: = Ye) = P(@ S Fo| 51 = 22), and P(@, = Fy and/or 


0, S Fo| 21 = 22) =a. 


(5.6.3) 


(ii) By the test procedure (5.5.2), over Fy = uw’ S*u/u'Sua < Fo accept H(y'h = 

- = y’E,), so that over N,[F, < Fo] we accept NH (uw: = --- w’&) = HE = 

- = §,) = Ho, and over its complement U,[F, 2 Fo] we reject Ho . We set it 
up as the extended Type I test for Ho, and note, as before, that U,[F, = 
u’S* w/u'Su = Fol is precisely equivalent to: the largest w’S*y/u’Su = Fo, the 
“largest” being under variation of y (for a given set of observations, that is, for 
a given set of S* and S). As before, given S* and S, the largest value of y’S*y/ 
u’ Sy is checked to be the largest root 6, of the p-th degree determinantal equa- 
tion in 6 


(5.6.4) | S* ~ 6S | = 0. 


From the definitions and assumptions of (subsection 5.4) of Section 5, it is easy 
to check that S is, a.e., p.d. while S* is, a.e., at least p.s.d. of rank q = min 
(p, k — 1). It will of course be, a.e., p.d. if p <S k — 1. In any case we can say 
that, of the p roots of (5.41), p — q will be always zero, while g roots, to be called 
6,,-°-*, 4, will be, a.e., positive, where q = min (p, k — 1), sothatO0 < 4 < 

- S 6, < « (suppose). The extended Type I critical region for Ho is thus 


(5.6.5) 0, = Fo. 





232 S. N. ROY 

To determine the size of the region, or rather, given the size a of (5.6.5), to 
determine Fy, we observe what was noted in the earlier papers [4], [7], [10], 
namely, that the joint distribution of (@,---, @,) (on the null hypothesis) in 
this case is exactly of the same form as that of (@,, --- , @)) of the previous case 
(on the null hypothesis in that situation) and that therefore the distribution of 
any root, say the largest, will come through by the same technique as was men- 
tioned for the previous case and will also be independent of all nuisance param- 
eters. We shall thus have Fy given, in terms of a, by, 


(5.6.6) P(@, 2 Fo| & = +--+ = &) =a. 


For k = 2 we shall have g = 1, so that there will be just one nontrivial sample 
root 6,(=8@ suppose), and just one nontrivial population root 6,(=6 suppose) 
(which will be zero on the null hypothesis and # 0, on the nonnull hypothesis). 
This 4 is easily checked to be Hotelling’s T? and its distribution both on the null 
and nonnull hypothesis are well known [1], [5] and relatively easy, so that 
(5.6.5) and (5.6.6) happen to be computationally much simpler in this situation. 

(iii) By the test procedure (5.5.4), over So m= (uiSrete)” (Cut Sirus) (2 Scone) | 
> ri, we reject H(uiSixe = 0) = Hoy,4, and over its complement accept this 
hypothesis, so that over Aacualriays < ro| we accept Nyiuell (uiZ ime = 0) = MA, 
and over its complement Uy,yol"u,u2 2 ro| reject Hy . We set this up as the ex- 
tended Type I test for Ho and note that 


- 2 Co 2 lio lo 2) 
Uniualaiue = (arSiome)” (ir Sirms) (u2Sceu2) 2 70] 
is exactly equivalent to: the largest value of 
Sa 2, a So 2 
(wi Sto) (41 Sirmn) (w2S20m2) S 


the ‘largest’? being under variation of gi and ye (for a given set of observations, 
that is, for a given set of Sy, Sx and Sy). As before, the largest value of this 
expression is checked to be the largest root 6,, of the pst degree determinantal 
equation in @ 


-p7) ’ ’ loa’ 
(9.0.40) | OSi, — SySoe Si | = 0. 


From the definitions and assumptions of subsection 5.4 of Section 5, it is easy 
to see that S and, therefore, Sy, and Sz are, a.e., p.d. and Sy, is, a.e., of rank p, . 
Under these conditions it is well known and proved in a number of places [6}, 
[15] that the p; roots of (5.6.7) will all, a.e., lie between 0 and 1, satisfying, say, 
0< 4S & S--- S 6, < 1. The extended Type I region for H, is thus 


2 


(5.6.8) 6,, 27. 


To determine the size of this region, or rather, given the size a, to determine 
ro , we observe that the joint distribution of (0, , --- , @p,) on the null hypothesis 
in this case goes over (under a simple transformation from cosine to cotangent) 
into that of the joint distribution of the roots (on the respective null hypotheses) 
in the two previous cases and the same technique for finding the distribution of 





TEST CONSTRUCTION 233 


the largest root also goes through. As before, this distribution will also be inde- 
pendent of all nuisance parameters. We shall thus have r} given, in terms of a, by, 


(5.6.9) P(6,, = ro | Si = 0) = a. 


6. Lower bounds of the powers of the test regions (5.6.2), (5.6.5) and (5.6.8) 
for the hypotheses (i), (ii), and (iii). 

6.1. Observations on the actual power functions. 

(i) It is well known that on the nonnull hypothesis the joint distribution of 
(#,, °°: , 6) of (5.6.1) (and hence of (@; and 6,)) also involves as parameters 
only the p roots 4, +--+, 0, of the population determinantal equations in 6, 


(6.1.1) | 21 — OXe| = 0. 

(Notice that, assuming ¥; and &, to be both p.d., these roots will all be positive 
and they will all be unity if and only if =; = 2, , that is, on the null hypothesis 
in this situation.) The exact distribution of (@,,--- , 4,) or of (@,, 6,) on the 
nonnull hypothesis will be quite complicated and whatever reduction is al- 
ready known to be possible [11], will not be discussed here. We shall merely 
write the power function formally as: 


~ . »/ . . 
P\@, = Fo and/or 6 S Fo\ 21 ¥ Xe 


(6.1.2) | 
= Pla;m, m2, Pp; O91, 82, °°° , Op}, 
to indicate on which parameters the power depends. 

(ii) To discuss the power function of the region (5.6.5), we use the convenient 
notation: § = D-*, n.g-/n; ip X k) = (niki, -°-, Wmge)s (k — IE* = 
tt’ — nt’; denote by = the (assumed) common p.d. covariance matrix of the 
k populations. We note that =*(:p X p) is p.s.d. (and might also be p.d.) of 
rank r S min (p, k — 1), where r is the rank of the matrix, 


(Vnil& — §), ++, Vnel&e — &)). 


Notice that the rank of this matrix must be S min (p, k — 1). Notice further 
that =* will be zero if and only if & = & = --- = &, that is, on the null hy- 
pothesis in this situation. We next observe, as is well known, that on the non- 
null hypothesis the joint distribution of (6,,--- , 6,) of (5.6.4) (and also of @,) 
will involve as parameters only the r(Sq = min (p, k — 1)) positive (the p — r 
others being zero) roots of the p-th degree population determinantal equa- 
tion in 9, 


(6.1.3) =* — or| = 0. 


As in the previous cases, so also here, the exact distribution of (6, --- , 6g) or 
of 6, on the nonnull hypothesis will be quite complicated and also different from 
that of the previous situation and whatever reduction is already known to be 
possible will, as before. not be discussed here. We shall again formally write the 
power function as 





234 S. N. ROY 


P(@, = Fo | under violation of H(&, = --- = &,)] 
(6.1.4) 


= Pla;n,k, p;@.,-°-: , Or}, 
to indicate the dependence on the relevant parameters. When k = 2 we have 
q = 1, r = 1, and in this case (6.1.4) will be the power function of Hotelling 
T-test, which is computationally quite manageable. 

(iii) To discuss the power function of the region (5.6.8), we observe what is 
well known, namely that, on the nonnull hypothesis, the joint distribution of 
(0,,--+ , 6,,) of (5.6.7) involves as parameters only the roots of the p:-th degree 
population determinantal equation in 0, 


(6.1.5) | Ou — Lele Tie| = 0. 


Assuming a p.d. 3, it is also known that 2, and S.». are both p.d., all roots 
being less than 1 and q roots being positive and p,; — q being zero, where q is 
the rank of Sw(g <= pi S peo). We write them as: 0 < 0, S --- £8, < 1. We 
shall not further discuss the complicated nonnull distribution of (6, --- , p,) 
or of 6,,, but merely write down formally the power function of the critical 
region (5.6.8) as, 


(6.1.6) Pl6.. 2 ro| Zz ¥ 0] = Pla; n, pr, po; 1, °°, Oe} 


cm) qi» 
to indicate the dependence on the relevant parameters. 

Although the exact nonnull distributions and hence the exact power functions 
would be quite complicated in all the foregoing cases we could, if we wanted to, 
obtain lower bounds, by using (4.2) and noting that the nonnull distribution 
for the univariate situations (iv) and (v) associated with (i) and (ii), and the 
bivariate situation (vi) associated with (iii), are all known in computationally 
manageable forms. But it is possible, as is shown in the next two subsections 
(6.2) and (6.3), to obtain much closer lower bounds to the power functions 
(6.1.2), (6.6.4) and (6.1.6). This is accomplished as follows. 

6.2. On invariance and independence. It is well known [15] that 

(i) the roots of (5.6.1) are invariant under the transformation 


Siip X p) = wlip X p)Vilip X p)u'(ip X p) and 
Sip X p) = wip X p)Vol:p X p)u'(ip X p), 


when yu is any constant (i.e., nonstochastic) nonsingular transformation matrix, 
(ii) the roots of (5.6.4) are invariant under the transformation: 


S*(ip X p) = ulip X p)V*(ip X p) K uw’ (ip X p) and 
Sip X p) = u(ip X p)V(ip X p)u'(p X p), 
where yw is any constant nonsingular transformation matrix, and finally 
(iii) the roots of (5.6.7) are invariant under the transformation: 


Sn(ipr & pr) = mip X pi)Vulipr X pi) X wii pr xX pr), 
Soo(t ps X po) = molt po X po) Vo2l(2p2 X po) X us(pe X pe) 





TEST CONSTRUCTION 
and 


Sio(tpi X po) = mi(ipr XK pi)Vielipr X po) X ys(2 ps X pe), 


where y; and yw are any two constant nonsingular transformation matrices. 
We next notice that 


(i) there exists a nonsingular transformation matrix (not necessarily unique), 


u(ip x Pp) = (uw 7 oe » Mp) P, 


under which py’ = De and pry’ = I(p) (where De is a p X p diagonal matrix 
whose diagonal elements are 0; , --- , 6,) and which transforms the p original 
variates into p new variates distributed in a canonical form, so that, for this 
set of p wisi = 1, 2,-++, p), (weSuei/miSoms)/(us2imi/mc2eu,), that is, 
(uiSimi/usSou) O14 = 1,---, p) will be distributed as p independent F’s, each 
with (n; — 1) and (nm. — 1) degrees of freedom, 

(ii) there exists a nonsingular matrix (not necessarily unique), u(:p X p) = 
(m1, °** , #p)p, under which wpX*y’ = De and wry’ = I(p) (where De is a diagonal 
matrix, of whose p diagonal elements, p — r are exactly zero, while the rest, r in 
number, are 9,,---, 6, > 0), and, furthermore, that this transforms the p 
original variates into p new variates distributed in a canonical form, so that for 
this set of py.’s(t = 1, 2,---, p), (wiS*wi/ueSusd(@ = 1, 2,---, p) will be dis- 
tributed as p independent F’s each with (k — 1) and (n — k) degrees of freedom. 
We note that out of these p F’s, p — r are necessarily central F’s (i.e., with “‘de- 
viation parameters” equal to zero) and r F’s are noncentral with ‘deviations 
parameters’’, (0,,--- , 6,) and 

(iii) there exist nonsingular matrices (none necessarily unique), 


mi(ipi X Pi) = (an +++ MoD, 

wo(2p2 X pr) = (wie --* Mpe2)D2 » 
under which y:2n41 = [(pr), w22e2u2 = I(pe) and 
wiXieus = (D0) 


(where Dye isa pi X p; diagonal matrix of whose diagonal elements, p, — qg are 
zero and the rest are nonzero, being @,, --- , 8,), and which transforms the 
original (p; + pe) variates into two new sets of p, and p2 variates, jointly dis- 
tributed in a canonical form with covariance matrix: 


/I(m) (Dye 9)\ ~r 


Dye Pi 
( I (pe) 
0 P2 — Pr 


P1 mam Pm 


This means that from the sets wa(t = 1, 2, +--+, pi) and yye(j = 1, 2,--- , po) 
it is possible to pick out linked ga and ga(t = 1, 2,---, pi) such that 





236 S. N. ROY 


(wir Sree) / (cr Strgaa) (ceSeme)(t = 1, 2, ---, pi) are distributed as the squares 
of p; independent correlation coefficients r; with (n — 2) degrees of freedom 
ach, the distributions involving 0; = pi(i = 1, 2,---, q & p;) as “deviation 
parameters’’. The absolute value of the total correlation coefficient will be indi- 
cated by enclosing the correlation in vertical bars. It is the distribution of this, 
that is, the distribution of multiple correlation when p = 2, that will come into 
the picture. It is possible to go even beyond this and pick out linked yg and 
w(t? = 1, 2,---, p: — 1), and at the last stage a yw,,1 linked with a set of 
(pz — pi + 1) wa’s(i = fr, pi + 1, «++ , po), such that there are p, independently 
distributed | correlations |, of which (p, — 1) are | total correlations |, and the 
last one is a multiple correlation between the p,th variate of the first p,-set 
and the (p., p: + 1, --+ , pe) variates of the second po-set. The deviation pa- 
rameters being 0,10 < 0; S --- S 0, < 1), we could so arrange that the first 
pi — q sample (total) | correlations | had zero deviation parameters to go with, 
the next gq — 1 sample (total) | correlations | had respective (and one each) 
deviation parameters (0; , --- , 8,-1) to go with and the last sample (multiple) 
correlation had 0, to go with. 

6.3. Actual construction of lower bounds. Now notice that 

(i) in the first problem, the region (5.6.2) includes as well all the F-regions 
considered under (7) of the foregoing subsection (6.2), so that, to the power 
function P of (6.1.2) we shall have a lower bound given by 


‘ 
(6.3.1) Pla;m,n2,p;9%,--: ,9,}>1—I] [1 — P(F = Foor Ss F3! 0] 
i=1 
(each with n; — 1 and n. — 1 degrees of freedom), which is easily calculable. 
(ii) in the second problem, the region (5.6.5) includes as well all the F-regions 
considered under (ii) of the preceding subsection 6.2, so that, to the power func- 
tion P of (6.1.4) we shall have a lower bound given by 


ta; n,k, p;O@1,--* , Oy} 


’ rj) 


(6.3.2) 


> 1 —[1 — P(central F = Fy)!’ [] [1 — P (noncentral F = F, | 0,)] 


i=1 


(each with k — 1 and n — k degrees of freedom), which is easily calculated; 
and finally 

(iii) in the third problem, the region (5.6.8) includes as well all the | correla- 
tion | regions considered under (iii) of the foregoing subsection 6.2, so that, 
to the power function P of (6.1.6) we shall have a lower bound given by 


Pia; n, Pi, Po; 1, °° , Og} 


> 1 — [1 — P(r’ = rq | null hypothesis)]?" * 


q 
x I] [1 — P(r’ = ro! p? = 09), 


i=1 





TEST CONSTRUCTION 237 


(each with n — 2 degrees of freedom), which is easily calculable, being really 
the power function of the multiple correlation of the first kind [2], when p = 2, 
for which tables are in part available which could easily be extended with 
modern computing facilities. 

The lower bound (6.3.3) could be easily improved, when p2 > p., by the fol- 
lowing consideration. Going back to the observations made at the end of sub- 
section 6.2 of this section (on independence between two sets of variates), we 
notice that since the region (5.6.5) includes p, — 1 | (total) correlation | regions 
and one (multiple) correlation region we shall have a lower bound (easily checked 
to be larger than (6.3.3)) given by 


Pha; n, p, p23 91, --*, 9} > 1 — [1 — P(r*® = 13 | null hypothesis)]’"* 
(6.3.4) 


q-—1 
x Hi — P(r? = ro | pi = 9,)] X [1 — P(R® = ro | pp = %)I, 
where all factors except the last are on | total correlations | distributed with 
(n — 2) degrees of freedom, while the last factor is on a multiple correlation 
distributed with (n — 2) degrees of freedom and (p2. — pi). 

It may be noted that in (6.3.2) both sides of the inequality are “‘known,”’ that 
is, computationally accessible when k = 2, that is, g = 1 and r = 1, the left- 
hand side being just the power function of_Hotelling’s T, while the right hand 
is also easily available (in this as in all other cases). 


7. Concluding remarks. It is of considerable importance at this stage to ask 
how “good” the lower bounds indicated in (6.3.1), (6.3.2) and (6.3.3) or (6.3.4) 
are. A lower bound to the power could be said to be “good” if it were (i) close 
to the actual power, and/or (ii) if it were itself pretty large, being greater than 
the level of significance a for reasonably large values of the deviation parameters 
and possibly getting larger as those parameters increase. For all the three tests 
condition (ii) has been numerically checked to be true over a fairly wide range 
of test values of the several parameters involved, and part of that material will 
be offered in a later paper. With regard to condition (i), in general, that is, for 
small samples, not only do we not. know the actual power (in which case the 
search for a lower bound would have been redundant) but at the moment we 
do not even know an upper bound of the expression: (actual power — given 
lower bound to it) + actual power. In large samples, however, the situation 
improves and it turns out that the relative error “small,” so that the given lower 
bounds are “‘good”’ also in the sense (i). 

The next pertinent question now under investigation is whether the proposed 
test regions (5.6.2), (5.6.5) and (5.6.8) are (a) unbiased and (b) admissible 
against all relevant alternatives under the respective situations. 

Also under investigation is the question as to how these tests compare with 
the corresponding likelihood ratio or Type II tests. On this it may be observed 
here, that, except in the degenerate cases where the two methods lead to the 
identical test, as, for example, the case k = 2 under (ii) where both lead to 





238 S. N. ROY 


Hotelling’s 7, the likelihood ratio tests have a far more difficult small sample 
(null) distribution problem to contend with than the proposed test. This is with 
regard to direct usability of the test. The small sample (nonnull) distribution 
problem (connected with the question of power) would be quite difficult for both 
types of test, but more so for the likelihood ratio test than for the other. This 
rules out direct evaluation of power for both types of test, but, while we have 
fairly good lower bounds to the power of the three different tests proposed, 
we do not at the moment know of any such lower bounds to the power of the 
corresponding likelihood ratio tests. 


REFERENCES 


[The references given below are by no means exhaustive, being merely such as would 
just suffice for an understanding of this paper.] 
{1] R. C. Bose anv 8S. N. Roy, “The distribution of the studentized D?-statistic,”’ 
Sankhyda, Vol. 4 (1938), pp. 10-38. 
{2} R. A. Fisuer, ‘The general sampling distribution of the multiple correlation coeffi- 
cient,’’ Proc. Roy. Soc. London, Ser. A, Vol. 121 (1928), pp. 654-673. 
[3} R. A. Fisuer, ‘The sampling distribution of some statistics obtained from non-linear 
equations,” Ann. Eugenics, Vol. 9 (1939), pp. 238-249. 
[4] M. A. Grrsuick, ‘‘On the sampling theory of the roots of determinantal equations,” 
Ann. Math. Stat., Vol. 10 (1939), pp. 203-224. 
[5] H. Hore ire, ‘“The generalization of Student’s ratio,’”? Ann. Math. Stat., Vol. 2 (1931), 
pp. 360-378. 
[6] H. Hore.uina, ‘Relations between two sets of variates,’’ Biometrika, Vol. 28 (1936), 
pp. 321-377. 
[7] P. L. Hsu, ‘‘On the distribution of roots of certain determinantal equations,’’ Ann. 
Eugenics, Vol. 9 (1939), pp. 250-258. 
[8] E. L. Leymann anv C. Stern, ‘‘Most powerful tests of composite hypotheses,’’ Ann. 
Math. Stat., Vol. 19 (1948), pp. 495-516. 
[9] D. N. Nanpa, “Distribution of a root of a determinantal equation,’”’ Ann. Math Stat., 
Vol. 19 (1948), pp. 47-57. 
[10] S. N. Roy, ‘‘p-statistics or some generalizations in analysis of variance appropriate 
to multivariate problems,”’ Sankhyd, Vol. 4 (1939), pp. 381-396. 
[11] S. N. Roy, ‘“‘The sampling distribution of p-statistics and certain allied statistics 
on the nonnull hypothesis,’’ Sankhyd, Vol. 6 (1942), pp. 15-34. 
{12] S. N. Roy, ‘“‘The individual sampling distribution of the maximum, the minimum and 
any intermediate of the p-statistics on the null-hypothesis,’’ Sankhya, Vol. 7 
(1945), pp. 133-158. 
v. Roy, ‘Notes on testing of composite hypotheses-II,’’ Sankhyd, Vol. 9 (1948), 
pp. 19-38. 
\. Roy, ‘Univariate and multivariate analysis as problems in testing of composite 
hypotheses-I,’’ Sankhya, Vol. 10 (1950), pp. 29-80. 
‘. Roy, ‘‘Lecture notes on multivariate analysis,’ (unpublished). 





ON A CLASS OF PROBLEMS RELATED TO THE RANDOM DIVISION 
OF AN INTERVAL 


By D. A. DARLING 


Columbia University 


Summary. Let X,, X2, --- , X, be n independent random variables each dis- 
tributed uniformly over the interval (0, 1), and let Yo, ¥1,---, Y, be the 
respective lengths of the n + 1 segments into which the unit interval is divided 
by the {X,}. A fairly wide class of statistical problems is related to finding the 
distribution of certain functions of the Y;; these problems are reviewed in 
Section 1. The principal result of this paper is the development of a contour 
integral for the characteristic function (ch. fn.) of the random variable W, = 
> }-o h)(Y,) for quite arbitrary functions h,(x), this result being essentially 
an extension of the classical integrals of Dirichlet. The cases of statistical 
interest correspond to h;(x) = h(x), independent of 7. There is a fairly extensive 
literature devoted to studying the distributions for various functions A(x). By 
applying our method these distributions and others are readily obtained, in a 
closed form in some instances, and generally in an asymptotic form by apply- 
ing a steepest descent method to the contour integral. 


1. Introduction. The statistical problems mentioned above are divided roughly 


into two classes: problems related to considerations of the Poisson stochastic 
process occurring in the study of infectious diseases, traffic flow, ete., and prob- 
lems pertaining to certain nonparametric tests of the hypothesis that a given 
set of data came from a hypothetical cumulative distribution function (cdf) 
F(x), which in turn are related to certain ‘“‘goodness of fit’’ problems. 

In 1946 Greenwood [6], in connection with a problem in epidemiology, posed 
the general problem of testing whether a given set of points on the unit interval 
could have arisen from the independent selection of points X,; described above, 
or whether the set of intervals Y,; they generate are too nearly equal for this 
hypothesis to be tenable. He suggested the statistic W, = > o}-o Y} and gave 
a few properties of its distribution. Later Moran [11] proved that W, had a 
limiting Gaussian distribution for n > «. 

If Uy, Ui, --- , Un are n + 1 independent random variables each having the 
density Be“, B > 0, x > 0, and if s, = Up +,---, + U, it isa well known 
fact that the joint distribution of {U,;/s,}, 7 = 0, 1, ---, is the same as the 
joint distribution of {Y;}, 7 = 0, 1, ---, n, the successive lengths of the inter- 
vals into which the unit interval is divided by n random points. This corres- 
pondence has been used in studying the Poisson stochastic process (cf. [3] chap. 
17) in which the interval between successive occurrences of the phenomenon 
are the U;. In Greenwood’s example these phenomena were the outbreaks of 
infectious disease. 


Received 6/24/52. 





240 D. A. DARLING 


In these examples the statistical problems can be reduced to evaluating the 
distribution of W, = > h(Y ;). In place of Greenwood’s suggestion of h(x) = 
x, other suggestions were made (cf. the discussion of [6]). Kendall suggested 
that h(x) = |x — 1/(m + 1) | might be analytically more tractable and Irwin 
suggested A(x) = (n + 1)"\(x — 1/(n + 1))’. For an analysis of the distribu- 
tion properties of the extreme Y,; (or U;) it suffices to consider an h(x) which 
is 1 fora < x < 8 and zero otherwise. A variety of problems can be reduced 
to determining the distribution of W, = )- h(Y,) for h(x) of this form. Gar- 
wood [5] studied some extremal properties of the Y; in connection with the 
occurrence of traffic vehicles on a highway. Fisher [4] had made a similar use 
in 1925 on the distribution of an extreme amplitude in a problem in harmonic 
analysis. Kendall made the suggestion of studying the difference (or quotient) 
of the largest and smallest Y; as being a more sensitive test function for the 
equality of the Y; than Greenwood’s sum of squares. 

Let X,, X.,--- , X, be independent identically distributed random variables 
with the common continuous cdf F(x). Let them be relabeled so that Xi < 
X; <,---, < X,end put Xo = —20, Xia. = +. Then, as is well known, the 
joint distribution of {F(Xj.:) — F(X5)}, 7 = 0, 1,---, n is the same as the 
joint distribution of the {Y;},7 = 0,1, --- ,n. Given a set of n data 7,22, °°: 


x, arranged in increasing order (with x = —*, 2.11 = +) a possible test 
of the hypothesis H that they came from a population whose edf is F(a) con- 
sists in choosing a function A(x) and rejecting H if Zz R(F (2541) — F(2;)) is 
sufficiently large or sufficiently small. Thus the basic problem is, as before, 


calculating the distribution of W, = >. h(Y,) for various functions h. 

Kimball [7] suggested h(x) = 2°, a > 0, and gave some partial results for 
the case a = 2. The asymptotic character of W,, for a = 2 was later analyzed 
by Moran [11] who proved W,, has a limiting normal distribution for n — ~. 
Sherman [13] treated the case h(x) = 3 |x — 1/(n + 1)}. It will be noted 
that these tests are somewhat related to the Kolmogoroff-Smirnov tests (ef. 
[1]) of the “goodness of fit’’ criteria. A discussion of the relative merits of these 
tests seems quite academic in view of the complete lack of information con- 
cerning their power. 

In the present paper we give a unified treatment of these distributions. In 
Section 2 we develop a simple formula for the ch. fn. of the random variable 
W, = }5h,(¥,) (Theorem 2.1) which is essentially an extension of the Dirichlet 
integral (Theorem 2.2). In Section 3 we study the joint distribution of the Y,;, 
finding the joint ch. fn. (Theorem 3.1) and the distribution of Yo + ¥i+,---, 
+ Y,. In Section 4 we put W, = >> A(Y,) and develop a few moments of W, 
useful in the subsequent work, and in Section 5 we give the asymptotic distribu- 
tion of W,, for h(x) = 2°, the statistic of Greenwood, Moran and Kimball (Theo- 
rem 5.1). In Section 6 we analyze the distribution of Sherman and in Section 7 
present two more possible test functions which yield readily to our methods. 

In Section 8 we study the random variable N,(a@, 8), the number of those 
Y; satisfying a < Y; < 8,7 = 0,1,---, n. As special cases we obtain the 





RANDOM DIVISION 241 


limiting distributions of the number of intervals of “average” size, ‘“‘small”’ 
size and “large” size (Theorems 8.1, 8.2 and 8.3, respectively) and the joint 
distribution of the largest and smallest Y, for finite n (Theorem 8.4). 


2. The fundamental formula. Let Y, , Yi, ---, Y, be the lengths of the 
n + 1 intervals into which the unit interval is divided by n random points. 
The following theorem is the basis for the subsequent analysis in this paper. 

THEOREM 2.1. Let fo(x), fila), «++ , fa(x) be n + 1 real-valued functions for 


which the abscissas of convergence of the corresponding Laplace transforms are 
all less than c. Then 


(2.1) EGY) ++ fal¥d) = 2 f E&I] | e°* f(r) dr; de 


2riJcio  j=0 0 
the path of integration being the straight line Re z = c (where Re z denotes the real 
part of z.) 
Proor. We have 


-1 Zn pTno1 z3 


(2.2) - (1107) “ss | | | ie | | fola file — 24) 


" Sa-thBe — Sah ~~ Zn) dz, die co* dr, 


since the joint distribution of the n random points, when arranged in order, 
has a uniform density differential n! dx, dx, --- dz, over the simplex 0 S x; S 
dw S--- S 2x, S 1. The trick in “evaluating” this integral consists in con- 
sidering the following function 


Zn—1 73 


Fo) =|] | j ep | fled files — 2) 


2+ falta — Sa-rfalr — Za) dx, daz --- dz, 

which we want to evaluate at r = 1. But it is clear that written this way F(r) 
z 

is merely the convolution foxfie +--+ *fn(r) where g(x)sh(x) =] g(x — Oh(t) dt. 
“0 


Since Laplace transforms multiply under convolution we obtain 


| F(r)e* dr = I] | e*"' f,(r,) dr, 
/0 7=0 +0 


provided Re z > c. We now simply apply the complex inversion for the Laplace 
transform to obtain 


1 ct to n p® 
Fx) = 5 [et IL [ ofr) ar, de, 
atl Jc—iw j=0 -'0 
and the theorem follows if we put x = 1 and supply the factor n!. 
It is interesting to note that in (2.1) the value of the integra! apparently 
depends on the value of the f;(r) for r > 1 while in (2.2) it does not. As a matter 





242 D. A. DARLING 


of fact the functions may be defined quite arbitrarily for r > 1 and not affect 
the value of (2.1). 

THEOREM 2.2 Let D be the domain in E, defined by t; = 0, D1 ty S 1. Then 
for the f(x) as in Theorem 2.1 


[fe --- [aeterpte) ++ fos fall = th = te = 22+ = ty) dly dtp +> dy 


1 : n .@ nig 
"25 r e I] I e '* f,(r;) dr; dz. 
To prove the theorem we merely make the change of variables in (2.2) t, = x; 
to = %2 — 21, °*+ , tn = Ln — Xn-1 for which the Jacobian is 1. 

Theorem 2.2 is, in a sense, a generalization of the integral of Dirichlet— 
that is, putting fo(x) = 2", fi(z) = 2%", +++, faa(z) = 2%", a; > O, and 
f(x) = f(x) we obtain 


fen ge" ... 40 £1 — $2) dh, dh -++ dt, 


1 e+ tao n a ‘id 
[Ore [err an [ee se ara 
c— too J ” 


Qi =1 6 


n »@ ae 1 c+ iso sk nila 
II rta,)} fr) oot fC? 22% dz dr 
j=l “0 - c— ico 


TIT (a;) im 


— T'(Sa;) -0 


— r)**i" f(r) dr 


since the inner complex integral in (2.3) is zero if r > 1 and is (1 — r)***" 
r(>- a,;) if 0 < r < 1. This is the classical Dirichlet integral usually developed 
through the theory of the Beta functions, (cf. Whittaker and Watson [17}). 


3. The joint distribution of the {Y;}. By means of Theorem 2.1 we can give 
certain properties of the joint distribution function of the Y;,7 = 0, 1, +--+, . 
For the ch. fn. of the Y; we have the following theorem. 

THEOREM 3.1. Jf t; ¥ t; fori # j, and if n = 1, then, 


t 


n i 
= ( Paine feeetty e 
rere) at TG 
j=0 nytt — 4,) 


and is defined for other values of the t; by continuity. 
To prove the theorem we put f;(Y;) = e*”/ in Theorem 2.1, giving 


ne e* dz 
—— 


Mt (z — it;) 


3=0 


se _¢tZta¥un 
E(e ii) = — 
wl J c—i120 





RANDOM DIVISION 243 


and if all of the ¢; are unequal the integral can be replaced by a contour integral 
surrounding the simple poles. A simple application of the theory of residues 
then establishes Theorem 3.1. 
If some of the ¢; are equal we proceed in the same manner. For instance if 
(4 ¢=@,1,---,e—1 
i, = 1 
lo, t=vvt+1,---,n 


we obtain the ch. fn. of xz, = Ze Y;, the vth smallest ordered observation 
from » observations taken from a rectangular population. Then 


. n! c+ io é dz 
E(e'’) ion = | ; 
2Qri c—ico zt (2 —_ at)” 
which again can be evaluated by residues, albeit somewhat awkwardly since 
the poles are no longer simple. But the density for z, is simple to calculate by 
considering 


1 ;oreo e* dz 


Qxi Sein 2 (2 — it)’ 


as the inversion of the product of two Laplace transforms 


1 r —~8z n-yv 1 1 
‘ o = =H, 
fence.” * "= 
1 a —szg tet w—l 1 
ro 4 ores ada 


Consequently taking the convolution and putting x = 1 
! 1 


— ee 
Ee ak eer a 


ms =a -—" ds 
and thus the density of z, is the Beta function n!s” "(1 — s)""/T'(n — v + 1)T(v) 
as is well known. Other properties also related to the distribution of order 
statistics from a uniform distribution which have been proved recently by 
Malmquist [10] may be treated in a like manner. 

An evaluation of the mixed moment E([LY?‘) is, of course, easily given 
in terms of the Dirichlet integral of the preceding section. 


4. The distribution of W,,. The statistical problems mentioned in Section 1 
may all be reduced to finding the distribution of W, = >>?» A(Y;) for certain 
functions h(x). 

By putting f;(z) = e in (2.1) we obtain 


; n! c+ ie @ ; 
(4.1) E(e*"*) = _ (| er ar) dW, 


2m ¢—100 Jo 





244 D. A. DARLING 


and from this expression we propose to study the distribution of W, = 
DAY). 


As a preliminary we find the first two moments of W, which will prove useful 


2 


in the work to follow. If | h‘(r) dr is finite it is simple to see the (4.1) can 


0 
be differentiated k times under the integral sign with respect to 7g. Differen- 
tiating once and putting & = 0 we obtain 


n' +t. 


uw = E(W,) = — e"W* (n + 1) | e'” h(r) dr dW 
0 


252 J c—ise 


/ eM) B—" GW dr 


— 120 


»@ : l 
(n + 1)! | h(r) om 
l 


n(n + 1) | (1 — r)""'h(r) dr. 


Similarly by differentiating twice and setting £ = 0 we obtain the second 
moment 


ai E(W?) _ ni em in + 1 | 


= e "h'(r) dr 
ol Jc—ix \ W - +0 


n(n + 1 — : */ ; 
+ “ca (/ e *h(r) ar) pai 


= n(n + 1) [ (i — vr)" th? (r) dr + n(n + np: [ [ h(r,)h(re) aie 
“0 0 0 onl 


Me 


ee _wi-n=g OW 
/ etic), dr, drs 


= n(n + 1) | (1 — r)*‘h'(r) dr + n*(n* — 1) I 
“0 Osritresl 


r,29,r2e20. 
(1 — ry, — re)” hA(r)h(re) dr, dre. 


From (4.2) and (4.3) we can calculate the variance o@ = ws — wi, and pro- 


ceeding in a similar fashion we can develop all moments if they exist. 


5. The distributions of Greenwood, Moran and Kimball. Greenwood [6] sug- 
gested A(x) = x’, and Irwin in the discussion of his paper suggested h(x) = 
(n + 1)-\(x — 1/(n + 1)*. Moran [11] later found the limiting distribution of 
Greenwood’s statistic was normal. Kimball [7] proposed h(x) = 2* for a > 0 
and found some partial results for case a = 2. 

In this section we find the limiting distribution for the case h(x) = x*. We 
have the following theorem. 

THEOREM 5.1. The random variable W, = z. Y5,a>0,a # 1, has a limiting 
normal distribution with the limiting mean and variance 





RANDOM DIVISION 


as (a + 1) 
not 


of ~ —. (P(e + 1) — eo + OF + DI, 


n2e-1 t 


n , 


respectively, that is, 


lim Pri 2— 4 < sh = [ e? at 
n—e \ on 2r 
for p, and a’, as above. 

Of course if a = 0 or a = 1 we have co, = 0 and it might be proper to speak 
of W, as having a degenerate normal distribution. 

This theorem will follow by applying a slight variation of the method of 
steepest descent, to the integral (4.1). The proof is given in a fair amount of 
detail and will serve as a model for the later distributions whose treatment 
follows essentially the same pattern and for which we give considerably less 
detailed proofs. 


Substituting h(x) = 2x* in (4.2) and (4.3) we obtain for the first two moments 
_ Tn + 2) 
Tin +a+t+ 1)’ 

T(n + 2) 
T(n + 2a + 1) 


m = I'(a + 1) 


(T(2Qa + 1) — nI*(a + 1)). 


a> 


The asymptotic character of these moments is easily obtained through the for- 


mula 
2 4 o(=) 820, 


I'(n + 8) 7 2ne+ mort 
giving 
I'(a + 1) bi (a — 1)(a — 2) l 
me pay Met Sea + One 
_ l(a +1) 1 


(n + 2)**"* (n + 


5\ten (T(2a + 1) — l(a + 1)(2a° — 3a + 3)) 


1 
? 0 (=) 


from which we deduce 


~ Pe + 


Hn oa 
n 


o— m — (T(2a + 1) — (a + I)I*(a + 1)). 





246 D. A. DARLING 


r 2 2 . . *,* 

Thus fora > 3, 0, — 0 and for a < 3, 06, — & while in the transitional case 
9 

a = 3 we have o, — 1 — 5/16. 


Using (4.1) we obtain for the ch. fn. of W, = >> Y? 


' c+ ie po . n+1 
¢n(t) = E(exp(itwW,)) = =| “(| er ar) dw 
251 Jc—io 0 
force > 0. Letting § = (n + 1)*%, W = (n + 1)z and shifting the contour 
parallel with itself we find 


(n+1)1 1 s?** 
(n + 1)**! 2m Je_io 


(5.2) ¢n((n + 1)*) = ee t)=2-"—1(B,(z, ))"™ dz 


where 
(5.3) B.(z, t) = (n + Ie | en intDeritintiem te dr.. 
“0 


Now it will turn out that (B,(z, t))"™ is, aside from a multiplied factor de- 
pending on n and ¢ but independent of z, actually a bounded function approach- 
ing a limit as n — ~ for | ¢! bounded and z arbitrary. This suggests that relative 
to the dominant term e‘"*”*z~"” this factor will cause negligible interference 
when n — &, (ef. Szegé [14], p. 220 who treats an example very similar to this.) 

If we write e"*%2-"" = e*!™ then f(z) = z — log z where log z is real 
when z is real and positive. Then since f’(1) = 0, f”(1) = 1, the saddle point is 
z = 1 with the critical direction parallel to the imaginary axis. Hence in (5.2) 
we merely take c = 1 to get the contour of steepest descent. 

Thus we put 


j - vy _ idy 
(5.4) z=] + V/n+ 1? dz = Vn+1 


for y in the domain 


(5.5) —(n+1) <y<(n+ 1)°, 0<6 <3 


and the entire integral has its essential contribution in this range—the value of 
the integral extended over the range complementary to (5.5) becoming negligible 
as n — © after we have modified B,(z, t) by a factor independent of z. With 
the substitution (5.4) we find 


gn((n + 1)* 4%) = 


{n+ 1)lert re 
(n + 1)"° Oe 


1 (n+1)8 


. v*/2(Bi(z, t))"** dy(1 + o(1)) 
(5.6) (n+1) 


—y2/2 n+l 
= € (B,(z, 0))"™ dy o(1)) 
Vv 2r — (n+4-1)% * 


by using Stirling’s formula for (n + 1)!. 


Next we turn to B,(z, t) as given by (5.3). By standard methods we obtain 
an asymptotic expansion (cf. Watson [16]) 





RANDOM DIVISION 


a 


B,(z, t) =(n + lz | eg retbs 
0 


{1 + it(n + 1) + 3(at)?*(n + 8) +--+} dr 
ul (a + 1) ice? ree + 1) 
(n + 1)!z vo, n + 1)2% 


and for z as in (5.4) and y in the range (5.5) 


- a 
=(14+ 25) ss man t (t/a 


_ iy -—2a _ 
, -(1+-%,) 1 + o(1) 


=1+ + o(1/n) 


so that 
(n + 1) log B,(z, t) 


wT (a + 1) 2T'(2a + 1) 
@ + iyiee + 10) Oe 


= + 1) 


(n + 1) log {1 + 


+ o(1/n)} 


(a +1) SE + yGH*{r@a + 1) —F%(a + 1} + o(1) 
= (n+ 1)4tP(a@ + 1) —tyal (a + 1) + (it)? Pe + 1) -P'(a + 1)) + 0(1), 


Using this estimate in (5.6) we obtain 


gn((n + 1) 4t) exp (—(n + 1)*it' (a + 1)) = exp{ —40°(P(2a + 1) — (a+ 1))} 


(n+1)8 
ie ‘ ‘a evie-walet) gy(1 + 0(1)) 


and hence 


lim Efexp (it((n + 1)°*W, — (n + 1)'T(a + 1))] 


i Wa - (n+ 1) "Ta + 2) 
lima E (exp (it —_m+ip-4 


exp{—3¢°(I(2a + 1) — (a + 1))} fe “oN 2—twal let) py 


= exp{—4t°(I(2a + 1) — (a° + 1)I*(a + 1))} 
which establishes the theorem, and gives an independent derivation for the 


asymptotic moments. 


6. The distribution of Sherman. To avoid some of the difficulties pertaining 
to the case h(x) = x* Sherman [13] considered the case h(x) = 4| x — 1/(n + 1) |. 





248 D. A. DARLING 


Kendall ({6}, discussion) had suggested that such a function might be easier to 
treat because of the simplification of the geometry of the integration. Sherman 
gave the distribution of W, = 33> | Y; — 1/(n + 1) | and proved it had a 
limiting Gaussian distribution. 

In this section we develop the distribution of W, using (4.1). Here the inner 
integration can be performed explicitly and the analysis is much simpler. We 
have in fact 


~® 


(6.1) 2 ee ra as 
0 


—z/(n+1 
z — 4 
so that using (4.1) 


n! c+ iso ( 7 t/2(n+1) -z'(n+1) ~z a 


bas [ir nandieetiinescan: We eakidens 
enl$) 2nt J cio + 2+ ve + 41g) 


n! nt 1 ert Dati) 1 ntl 

|] {mt re Oe 

2nt Je—io (2 — 30E z+ 32 i 

n! c+ iso * n + " elt Mai 1 j ie 


ies 2 tee 
‘jaa \ J (n — Jj)! d(tg)"~ ug 
by a simple application of the theory of residues. From this ch. fn. we can easily 


deduce the density for W, . 
We rewrite the preceding expression 


n n—j i&/(n+1) 7 
ion (" + ') : 1 d € ‘) 


it/(n + 1) 
and invert termwise. Let X, , X2, --+ be independent and uniformly distributed 
over (0, 1). The density of X, + X_. + --- + X; is then (ef. Cramér [2], p. 245) 


(62) fe = oA eH (i) @- wr o<ess, 


il 1)! Osk<z 


J n — j)\(n + 1) d(ag) 


_s 


Then the density for 1/(n + 1)(X; + X2 + --- + X;,) is (n + 1)f;((n + 1)z) 


and the ch. fn. for it is 
eit/(ntD) _ 1\? 
(are + 1) ). 


- ; iba ia 1 F od itz n— ‘ 
a (seri - 5) = [ ea" (n + W)f((n + 1)x) de. 


Having inverted the typical term in ¢,() we obtain for the density of W, 


Hence 


ry (nt " a"? fi((n + Vz) 
nt 2 J (n — 7)'(n + 1) 
with f;(z) as in (6.2). 





RANDOM DIVISION 249 


It is also simple to get an asymptotic distribution for W,, following the pattern 
of Section 5 exactly. If we put z = (n + 1) + (n + 1)iy, = (n+ 1) in 
(6.1) we obtain after some easy estimates 


+e 


| e ra+( i§/ 2) [r—1/(n+1)| dr 
0 


—1 2 
ty = t 
= ‘ 1 . — 
1+ Gay t th wi 8(n + 1) 
and we choose the same contour as before with c = 1. These same substitutions 
yield 


os o(t/n)} 


nie’ dz/2"*) = in/2ne"" dy(1 + 0(1)) 


as in the preceding example so that 


¢n((n + veg er 4 


-y S/2tty(ze“3—4)—08/8 
TE 


E(e***! 4(w,—e7! —t3/2(2e~1—4e~2) 


Nh ala 


which exhibits the approach of W,, to the normal distribution. 


7. Other possibilities. If we put h(x) = log x we can evaluate ¢,() explicitly, 
obtaining, 


' c+ ico 2 n+l n+l 
enlt) = on / é (| caerw ir) dz = rn + irre + 1) 
c— ia “0 


2ni P((n + 1) + 1) * 
Setting & = (n + 1) *t and using Stirling’s formula we get 


log ¢n((n + 1)%) = —it(n + 1)*(log n+-y) — 30(x°/6 — 1) + o(1) 


and it follows that >> log Y; is asymptotically normally distributed with 
asymptotic mean and variance —(n + 1)(log n + y) and (n + 1)(x°/6 — 1) 
respectively, y being Euler’s constant, y = .577 --- 

In the preceding examples we have always obtained a limiting normal distri- 
bution and it seems a reasonable conjecture in analogy with the central limit 
theorem that we will generally obtain the asymptotic Gaussian distribution 
when the two moments (4.2) and (4.3) exist. But it appears very difficult to 
prove a theorem of this generality. We next give an example for which we do 
not obtain the normal distribution. 

Let h(x) = 1/2; then since 


= 


| fae + 2 Ki\(2V/z), &>0, Rez>O 
“0 


where K,(z) is the Bessel function, (ef. Watson [16]), we have for the Laplace 
transform of the density of W, = )>Y}' 


E(ee"") = / ez! (2k Ky(2v/ez))""" dz 


“Tl J c—iw 





250 D. A. DARLING 


Again letting z = (n + 1) + iy(n + 1)' and t = Un + 1)" we have 
202 Ki (2V/éz) — 2V/t Ki (2/2) and this expression is the Laplace transform for 
the density whose cdf is e~'*,0 < x < «. It will follow then that W,/(n + 1) 
has the same limiting distribution as the sum of (n + 1) independent ramdom 
variables each having a cdf e~"”, and thus that this limiting distribution is a 
quasi-stable law of exponent 1 (cf. Lévy [8], p. 208). 


8. The number of intervals satisfying certain inequalities. Let N,(a, 8) be 
the number of those ¥, which satisfy a < Y, < 8forj = 0,1, --- , m. A number 
of statistical problems relate to the distribution of N,(a, 8) as we have outlined 
in Section 1. 

If we put 
11 a<r<sg 
h(r) = 

\0 otherwise 
then N,(a, 8) = >> A(Y;) and our preceding discussion is applicable in studying 
the distribution of this random variable. 

Using (4.1) we have 

a n! ct iso 2 ™ \n+l 
E(e™ n(a,38 ) i ioe ew [ ees (r) dr> 
2rt Jc-ix \ Jo ) 
(8.1) sities 
ia bn 2” ' fy 4 (e* oo 1) (e* =a “~) dz 
2m c—ieo 
and this expression will be a basis for the analysis of NV, . 

“Most” of the Y, are presumably of the order of magnitude (n + 1)’, and 
we first find the asymptotic distribution of the number of those Y; which lie 
between a/(n + 1) and b/(n + 1). 

THEOREM 8.1. The random variable N,(a/(n + 1), b/(n + 1)) ts asymptoti- 
cally normally distributed with an asymptotic mean and variance 


in ~ (n + 1)(e* — €”) 
on ~ (n + I)(e* — €* — (ae — be’)’). 
The proof parallels the analysis of Section 5. Putting z = (n + 1) + (n + 1)hiy, 


£ = (n + 1) tin (8.1) we deduce easily 
1 + (e* = 1) (e -2a/(n+1) =e — 
ut - ait H(e* — &”) ty 
= i — )—- —— = 
+ Gan & , 2(n + 1) n+ 1 
and thus 


(ae * — be”) + o(1/n) 


{ t P ‘ 
E(expyi 5 N,(a/(n + 1), b/(n + 1)) — it(e* — e”) (n+ 1)'>) 


t2/2(e-a—e—>) 2+ty (ae~2—be—5 
—e 


' dy 


oy 
V 2r My 





RANDOM DIVISION 


ols: N,(a/(n + 1), b/(n + 1)) — (n + 1) (€* — €”) 
B (exp it ere et pier oH) 


—1t2/2(e—-4—e— 6>—(ae~4—be~ 6 )2) 
ad e ’ 


which proves the theorem. 

We next analyze the distribution of the number of “small” Y;. It turns out 
that with probability 1 only finitely many are of the order of magnitude 
(n + 1) 7 asn— w, 


TueoreM 8.2. N,(a/(n + 1), b/(n + 1)*) has an asymptotic Poisson dis- 
tribution with parameter (b — a). That is 


k 
lim Pr{N,(a/(n + 1)*, b/(n + 1)*) = k} = €°™ eo. 2. k = 0,1, 
To prove the theorem we put a = a/(n + 1)’, B = b/(n + 1)° and 
z=(n+t+1)+ (n+ 1) iy in (8.1), giving 


(1 ‘ (e'* a 1)(e** = at ha a erarce—a) (4 + o(1)), 


and we have, arguing as before, 


E(eit* (n+1)2,b/(n —, a e tality 


which establishes the theorem. 
The distribution of the number of “large” Y,; proceeds in a similar way. 


TueoreM 8.3. N,((log (n + 1) + a)/(n + 1), (log (n + 1) + b)/(n + 1)) 


° ° ° . ° ° _ anf 
has an asymptotic Poisson distribution with parameter (e* — e~’); 


him PLN, (ee (n + 1) +a log (n+ 1) + "7 - r 


n+1 ; n+l 


n--x 


a —byk 
_ gene play 
= € ——_: . . a MOY 
Thus only finitely many intervals are as large as log n/n asymptotically with 
probability 1. To prove the theorem we put 


log” . log 7 


“ser 01°? SST 
in (8.1) and take z = (n + 1) + (n+ 1)'iy giving 
al +4. (e"* eae 1)\(e" ao a iid pibarest—1) 


and the rest of the proof proceeds as before. 
From 8.2 and 8.3 we can find the asymptotic distribution of the largest Y; 
and the smallest Y;. Let, in fact, Un = min (Yo, Yi, --- , Yn) and V, = 





252 D. A. DARLING 


max (Yo, ¥1, --- , Y.,). Then putting a = 0, k = 0 in Theorem 8.2 and b = ~, 
k = Oin Theorem 8.3 we obtain 

lim Pr{U, > b/(n + 1)*} =e”, 0<b< a 
\ 


lim PriV. < me Te oe 


nen n+l ) F 


These two expressions were given by Lévy [9] using geometrical arguments. 
It is possible to show that U’, and V, are, besides, asymptotically independent. 


If we put a = a/(n + 1)* and 8 = log af jn + 1 in (8.1) and duplicate the 
) 


above reasoning we get 


—-~ae <a<c @. 


lim me, > a/(n + 1)°,V, < met pa! est) “e. 


However by taking a different attack we can get more precise information 
about the joint distribution of U, and V,,. 
THEOREM 8.4. 


Pei. > «eV. <8] Pria < Y; < 8,j = 0,1,--:,n} 


(" . " (—1)’ (1 — a(n + 1 — 3) — Bj)” 


Q) Z 


(8.2) 


where >* means to include only those terms for which 1 — a(n + 1 — 7) — 8 


is positive,j = 0,1, ---. 

The required probability is clearly the probability that N,(a, 8) is equal to 
(n + 1). Hence in (8.1) if we expand the factor in braces and select the co- 
efficient of e"*” we get 


' c+ iso 
Pr{N,(a,8) =n+ 1} = ae ff" ~ cy" & 


2772 c—i2 
n -C+ 120 
n 1 3 wi a 83) ‘ 
> ( TY (-y et | "Si eee ie 
7=0 J 21 c—ix 
and this is equal to (8.2) by a direct application of the residue theorem. 
Putting a = 0 we obtain the probability that all intervals Y, are less than 8 
; n 1 es 
Prive <al= > ("*")(-y - a 
0<j)<1/8 J 
a result going back to Whitworth [18] and used by Fisher [4] in studying the 
significance of the largest amplitude in harmonic analysis, and by Garwood [5] 
in traffic studies. Setting 6 = 1 in (8.2) we have only the term corresponding to 
j = 0 in the series, and the distribution of the minimum of the )’; becomes 
Pr{U, > a} = (1 — (n + 1a)” a<1/(n+1) 


which is also a result of considerable age. 





RANDOM DIVISION 253 


There are also interesting relationships between the distributions of U, and 
V, with the work of Robbins [12] and Votow [15] on the measure of a random 
set. 

By using (8.2) it would be easy to find the distribution of V, — U, or V»/Un 
and, as suggested by Kendall ({6], discussion), these might be better statistics 
to test for the equality of the Y; than the statistics W, discussed in Sections 
5, 6, and 7 above. 


REFERENCES 
{1} T. W. ANDERSON aNpD D. A. Darina, ‘‘Asymptotic theory of certain ‘“‘goodness of 
fit’’ criteria based on stochastic processes,’”’ Ann. Math. Stat., Vol. 23 (1952), 
pp. 193-212. 
[2] H. Cram&r, Mathematical Metheds of Statistics, Princeton University Press, 1946. 
[3] W. Feuuer, An Introduction to Probability Theory, John Wiley and Sons, 1950. 
[4] R. A. Fisner, ‘‘Tests of significance in harmonic analysis,’”’ Proc. Roy. Soc. Edin- 
burgh, Sect. A., Vol. 125 (1929), pp. 54-59. 
[5] F. Garwoop, ‘‘An application of the theory of probability to the operation of ve- 
hicular-controlled traffic signals,’’ J. Roy. Stat. Soc. Suppl., Vol. 7 (1940), pp. 
65-77. 
[6] M. Greenwoop, “The statistical study of infectious diseases,’’ J. Roy. Stat. Soc., 
Vol. 109 (1946), pp. 85-110. 
(7) B. F. Kimsatt, ‘Some basie theorems for developing tests of fit for the case of non- 
parametric probability distribution functions,’”’ Ann. Math. Stat., Vol. 18 (1947), 
pp. 540-548. 
[8] P. Lévy, Theorie de l’ Addition des Variables Aleatoires, Gauthier-Villars, Paris, 1937. 
[9] Pau. L&vy, “Sur la division d’un segment par des points choisis au hasard,” C. R. 
Acad. Sci. Paris, Vol. 208 (1939), pp. 147-149. 
{10} S. Maumaquist, ‘“‘On a property of order statistics from a rectangular distribution,” 
Skand. Aktuarietids., Vol. 33 (1950), pp. 214-222. 
{11] P. Moran, ‘“‘The random division of an interval,” J. Roy. Stat. Soc. Suppl., Vol. 9 
(1947), pp. 92-98. 
{12} H. E. Ropsrns, ‘On the expected value of two statistics,’ Ann. Math. Stat., Vol. 15 
(1944), pp. 321-323. 
(13) B. SHerman, “A random variable related to the spacing of sample values,’’ Ann. 
Math. Stat., Vol. 21 (1950), pp. 339-361. 
[14] G. Szeed, ‘Orthogonal polynomials,’’ Amer. Math. Soc. Collog. Publ., Vol. 23 (1939). 
{15! D. F. Voraw, ‘‘The probability distribution of the measure of a random set,’’ Ann. 
Math. Stat., Vol. 17 (1946), pp. 240-244. 
[16] G. N. Warson, Theory of Bessel Functions, Cambridge University Press, 1944. 
{17| IE. T. Wuirraker anp G. N. Watson, Modern Analysis, Cambridge University Press, 
1927. 
{18] W. A. Wairwortnh, Choice and Chance, Cambridge University Press, 1897. 





SEQUENTIAL DECISION PROBLEMS FOR PROCESSES WITH CON- 
TINUOUS TIME PARAMETER. TESTING HYPOTHESES' 


A. Dvoretzky, J. Krerer anp J. WoLFrow1Tz 


Hebrew University, Jerusalem, Cornell University and University of California at 
Los Angeles 


Summary. The purpose of the present paper is to contribute to the sequential 
theory of testing hypotheses about stochastic processes with a continuous param- 
eter (say, ¢ which one may think of as time). Sequential decision problems about 
such processes seem not to have been treated before. Subsequently we shall 
treat problems of point and interval estimation and general sequential decision 
problems for such processes. The results, in addition to their interest per se and 
their practical importance, also shed light on the corresponding results for dis- 
crete stochastic processes. The subjects of sequential analysis and the theory of 
decision functions were founded by Wald, and we treat our present subjects in 
the spirit of his approach. The general results of decision theory, such as the 
complete class theorem, carry over to sequential problems about stochastic 
processes with continuous time parameter. As specific examples we treat the 
Wiener and Poisson processes and obtain, for example, the exact power function. 
(For discrete processes the corresponding known results, due to Wald, are ap- 
proximations). 

1. Introduction. Let {x,(t),t 2 0} and {x2(t), t = O} be twodifferent stochastic 
processes. The statistician observes continuously, beginning at ¢ = 0, a process 
{x(t), ¢ = 0} which is either {2,(t)} or {2.(t)}, and wishes to decide, as soon as 
possible, whether {2x(t)} is {x,(¢)} or {22(t)}. ‘““As soon as possible’? means the 
following here. Let 7 be the time when he reaches a decision (in general this 
may be a chance variable and need not be a constant). Let 2;T denote the ex- 
pected value of T when {r(t)} = {2,(t)}, 7 = 1, 2. Let a, a2 be two positive 
constants, a, + a: < 1. Subject to the requirement that the probability of an 
incorrect decision when {x(t)} = {2;(t)} be at most a; , the problem is to give a 
procedure for deciding between {2,(t)} and {.2(f)} such that /,(7T) is a minimum 
for i = 1, 2. This is simply the same formulation for stochastic processes with 
continuous parameter as was originally given by Wald ({3}, [4]) for stochastic 
processes with a discrete parameter. 

In this paper we shall limit ourselves to stochastic processes which fulfill 
the following conditions. For every t = 0, z(t) is a sufficient statistic for the 
process, that is, the conditional distribution of the chance function x(7),0 S 7 St, 
given x(t), is, with probability one for every ¢, the same for the processes {.;(¢)} 

Received 11/4/52. 

‘This work was sponsored by the Office of Naval Research under a contract with Co- 


lumbia University and under a National Bureau of Standards contract with the University 
of California at Los Angeles. 


254 





SEQUENTIAL DECISION PROBLEMS 255 


and {.(t)}. For every ¢ and x, both 2,(t) and 2.(t) have frequency functions, 
say fi(z, t) and f2(z, t), respectively. Let 


fo(a(t), t) 
(1.1) +(t) = log =e), 0) (4(0) = 0). 
Finally we postulate that the 4+() process is one of stationary independent 
increments, that is, a) for every positive integral k, every h > 0, and every 
sequence 4} << --- <& St, #4 + h) — 4(2) is distributed independently 
of +(t,), --- , +(t.); b) the distribution of +(¢ + h) — 4(0) depends only upon 
hk and not upon @. 

Thus our theory will include the following problems: 1) testing hypotheses 
about the parameter of a continuous Poisson process with stationary independent 
increments (to be discussed in detail below in Section 3); 2) testing hypotheses 
about the mean of a Wiener process (to be discussed in detail below in Section 4); 
3) testing hypotheses about the value of p(0 < p < 1) in the following process 
with stationary independent increments (called the negative binomial): the 
probability that x(t) = k for every nonnegative integer k is 

P(t + k)p'(l — p)'/TH + )TO; 
4) testing hypotheses about the value of 6(@ > 0) in the following process 
with stationary independent increments (called the Gamma process): the prob- 
ability density of x(t) at a(x = 0) is given by 2‘ *e*°/T(t)6*. 

In practice it is, of course, impossible to observe without error a sample fune- 
tion of a continuous process such as the Poisson process or the Wiener process. 
Yet in many cases these processes do constitute an excellent approximation to 
physical reality. For example, the incidence of mesons on a Geiger counter is 
generally assumed to follow a Poisson process. If the recording lag and the dead 
time of the Geiger counter are very small, a physicist could use the present 
theory to decide between two possible values of meson density. In this case 
continuous observation means simply exact registration of incidence times. As 
another example, our method, or a modification of it, may be applied to problems 
of life testing. 

Moreover, there are several distinct advantages of the continuous parameter 
procedure over the discrete one. These are as follows. 

The expected duration of observing the process before reaching a decision 
about which hypothesis to adopt can obviously only be shortened by allowing 
continuous observation. 

Moreover, there are many cases, notably the Poisson and Wiener processes, 
in which an exact determination of the optimal procedure is possible in the 
continuous case, while in the discrete case so far only approximations have been 
derived. Thus, even when treating the discrete case, the continuous case, which 
is easier to treat, may be used to derive approximations when the unit of time 
is small. 

There may also be other advantages in special problems. Thus it is seen in 





256 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


Section 3 that in the continuous Poisson process the solution does not depend, 
as in the discrete case, on the values of the two parameters A; and A. , but only 
on their ratio \2/A; . 

2. Application of the Wald sequential procedure. Optimum character of the 
test. A careful examination of the results of [5] and [6] shows that their conclu- 
sions in no way require that the processes be discrete in time, and under the as- 
sumptions about the processes made in the preceding section the following re- 
sults hold. 

i) Let a and b, b < 0 < a, be given numbers, and let us employ the Wald 
sequential probability ratio test as follows. As long as 4+(¢) lies between b and a, 
continue observing {x(t)}. As soon as +(d) S b, stop observing {x(t)} and decide 
fa(t)} = }x,(t)}. As soon as 4(t) 2 a, stop observing {.r(t)} and decide {x(t)} = 
{x.(t)}. Let a:(a, b) be the probability of error and /(T | a, b) be the expected 
value of T when {x(t)} = {r(}, 7 = 1, 2. For any other procedure with re- 
spective probabilities of error at and a; and respective expected values ET 7' an 
EIT, we have that at S a;,i = 1,2 implies ET = £47 | a, 6), that is, the 
optimum character of the Wald sequential probability ratio test (with respect to 
all randomized as well as nonrandomized procedures). 

ii) Let ¢, Wy and IW, be positive numbers, and let g,; be the a priori probability 
that jr(} = {xO}, 7 = 1, 2 (ef. remarks about a priori probability distribu- 
tions in [5| and [6]). There exist two numbers a(c, W,, W2, gi, ge) and 
b(c, Wi, We, 91, ge) such that, if the statistician continues to observe {.x(¢)} 
until either +(4) S 6 or +(t) 2 a, and then decides respectively that {x(t)} = 
tai(t)} or }(t)} = fxe(t)}, he will minimize g,(a,W, + ck,T) + g2(a.W, + ck.T) 
with respect to all possible procedures for deciding between {.,(¢)} and {.r2(t)}, 
where /;T is the expected value of T when {.xc(t)} = {2,(t)}, i = 1, 2. (It is of 
course assumed that a 2 b, with the equality sign not excluded. Also +(0) = 0. 
Thus if a = b, ora S O orb 2 O, the decision will always be made at time ¢ = 0.) 

It is to be understood that any procedure which the statistician will employ 
should be such that the quantities a; , a. , £,7', and £,7' will be well defined. The 
consideration of questions of measurability is a little more involved for our 
problem than it is in [5| and [6], but because of the assumptions on the processes 
made in the preceding section it can be carried out without difficulty. We shall 
therefore omit consideration of such questions. 

From the remarks at the end of Section 1 and well known results of sequential 
analysis (see Stein [2]), it follows that E,7" < » for any sequential probability 
ratio test and any positive /:. 

Other important results of sequential analysis established for discrete processes 
apply also to the continuous parameter case. For example, let {2(t), ¢ 2 0} 
(z(0) = 0), be a process with stationary independent increments. Assume that 
[z(1) exists and denote it by h. Suppose that one has any stopping rule, that is, 
there is defined a positive chance variable T such that the set of chance functions 
for which T = ¢ is defined only by conditions on 2(r), 0 S 7 S t. Then Wald’s 





SEQUENTIAL DECISION PROBLEMS 
equation ([3}, [7}) 


(2.1) k2(T+) = hE(T) 


holds. Suppose also that Ee“ exists for all real u, and denote it by ¢(u). Then 


Wald’s fundamental identity ({4], p. 159) 
2.2) Ee?" (g(u))~” = 1 


holds for many stopping rules, including in particular the rule where T = ¢ 
if 2(t) = a or 2(t) S b, while b < 2(r) < afor r < t. Here a and b are constants, 
a > 0,6 < 0. The simplest way of proving these results is to derive them as im- 
mediate consequences of a theorem of J. L. Doob on martingales with a con- 
tinuous parameter ({1], Chap. VII, Theorem 11.8). For (2.1) the martingale 
process is {2(t) — ht}, and for (2.2) the martingale process is {e“*‘" (¢(u))~‘}. 
Another, more laborious way, of proving these results is to consider the process 
{z(t)} only at time intervals which are integral multiples of A, proceed as in 
{[4| or [7], and then let A approach zero. This is, however, a laborious way of 
proving a special case of the martingale theorem. 

3. The Wiener process. Let {x,(t)} and {x2(t)} be Wiener processes (¢ = 0, 
x\(0) = xz2(0) = 0) each with a variance which without loss of generality we take 
to be one per unit of time. Let m; and m2 (m, # m2) be the mean values per unit 
time of {2x,(t)} and {2,(t)}, respectively. Thus we have the following situation: 
t = 0 is a continuous (time) parameter. For any a , a2(0 < a; < ae), x;(a2) — 
z;(a;) is normally distributed with mean, m;(az — a)(i = 1, 2) and variance 
(a; — a;). For any integral k and sequence ai <a; Saji <a: S--- Sai <a, 
the k chance variables z,(a2) — x;(ai),7 = 1, --- , k, i = 1, 2, are independently 
distributed. The statistician observes continuously, beginning at ¢ = 0, a process 
{x(t)} which is either {2,(t)} or {x2(t)}, and wishes to decide whether {z(t)} = 
fa,(t)} or {x(Q} = {a(t}. 

At time & the quantity x(t) is sufficient for deciding between {2,(t)} and 
{xo(t)}, that is, it is unnecessary to know the previous history of the process. 
The likelihood ratio L(x(é), t)) at time ¢ is given by 


1 


——e 
: = 
L(x(t), ) = oe 


oh (2(t)—my 8) 2/8) 
= ¢ 4((2(t)—my 
2rt 


—}((x(t)—mgqt) 2/t) 


Hence 


4+(t) = x(t) (m2 — m,) atts > (m3 2 m’). 


The sample functions of the {.c(t)} process are continuous with probability one. 
We choose a and b, b < 0 < a, such that the statistician will continue to observe 
{x(t)} only until +(t) = a or $(t) = b. In the first case he will decide that {x(t)} = 





258 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


{ze(t)}, in the second case that {z(t)} = {2,(t)}. We shall now find a,(a, b), 
E,(T | a, b), and the distribution function of 7. The same problem for the dis- 
crete stochastic process when one observes {x(t)} only at ¢ = 1, 2, --- has been 
studied by Wald (3), [4]) who gave, inter alia, approximations for these quanti- 
ties. An examination of his argument shows that, in his problem, his results 
are approximate only because he neglects the excess of +(7') over a or b. In our 
problem this excess is zero with probability one, and Wald’s formulae cease to be 
mere approximations and become exact. Thus we have, for example, ({4|, p. 50, 
equation (3.42)) 

b 


‘ l1—e 
(3.1) a,(a, b) = a a 

_ e(e? — 1) 
(3.2) a,(a, b) = a 

For any Wiener process with variance one per unit of time, not necessarily 
either {2:()} or {22(t)}, the probability that 4+(¢) will reach 6 before reaching 
a is given exactly by [4], page 50, equation (3.43). Call this probability H. Then, 
for any Wiener process with variance one per unit of time, not necessarily {2;(¢)} 
or {2(t)}, ET = (Hb + (1 — H)a)/h ([4], page 53, equation (3.57)). These 
results can be derived from (2.1) and (2.2) by Wald’s methods. Also the density 
function of T is given exactly by formula (A:194) on page 195 of [4]. 

In practice one has to find a and b to correspond to given values a; and a . 
Solving (3.1) and (3.2) we obtain 


(3.3) a 


(3.4) b 


All of the above results are exact because the excess of +(7’) over the boundaries 
a and b is zero with probability one. For the same reason one may already infer 
the optimal character of the Wald sequential probability ratio test for testing 
hypotheses about the mean of a Wiener process from the approximations and 
heuristic arguments given by Wald on pages 196-199 of [4]. 

One may raise the question how to test hypotheses about the variance of a 
Wiener process. However, a scrutiny of the problem shows that from a knowledge 
of a sample function in any interval, no matter how small, one can, with prob- 
ability one, determine the variance to any arbitrary accuracy, so that the problem 
is trivial. For suppose {x(¢)} is a Wiener process with mean value m and variance 
v, both per unit of time. Suppose the process has been observed from ¢ = 0 to 
t = Ho, where H, is any positive number. Let N be any integer which will later 
approach infinity, and write t; = tHo/N,i = 0,1, --- , N. For any 7 from 1 to NV 
we have 


) Ho os m? Ho 


E(a(t,;) - x(ti-1))° =e N N2° 





SEQUENTIAL DECISION PROBLEMS 


Now, fori = 1, --- , NV, the chance variables 


: 2 1 2 mi 
x t; up t,_ — oo P To 
\¢ (ti) — (a) — 0 7 — my 
are identically and independently distributed, with variance of order N~* and 
fourth moment of order N™*. Hence the fourth moment of 


N 


in (x(t;) — a(t;))? 


‘nl oe mH, 
Ho N 


is of order N-*. Consequently, for any « > 0 we have that 


4 


| 


DX (x(t,) — x(t,))? 
P t—1 a 5 


l He : 


where C is a suitable constant. Since the series ).N~* converges it follows im- 
mediately from the Borel-Cantelli lemma that (0-1 (x(ti) — x(ti-1))*)/Ho con- 
verges to v with probability one as N > o. 

4. The Poisson process. In this section we treat the problem of deciding which 
of two values given in advance represents the correct mean occurrence time of a 
Poisson process with stationary independent increments. 

The probability that a Poisson process with mean occurrence time \ will 
result in exactly k occurrences between times ¢ = 0 and it = T is 


rm\k 
(4.1) a oo (k = 0, i, 2, , -* J 


k! 
Let H,(i = 1, 2) denote the hypothesis that \ = A; , where \; and ), are any 
two different positive numbers. It is clear that the two corresponding processes 
satisfy the conditions imposed in the introduction. Hence, given two positive 
numbers a; , a2 , (a; + a2 < 1), the optimal test procedure for deciding between 
H, and Hz which satisfies the condition that the probability of a wrong decision 
when H; is true does not exceed a,(¢ = 1, 2) is given by a Wald sequential prob- 
ability ratio test. 

More specifically, in view of (4.1) we have 


(4.2) H(t) = x(t) log = + (Me — Adt. 
1 


Thus, assuming 4» > A,, the best decision rule is specified by two numbers 
a, b(b < 0 < a) in the manner described in the introduction. 

Suppose now that a; and a are the actual probabilities of error. According to 
Wald ([4], p. 196) we have 


1 — a ss P(H2) ae ees P(H;) 


(4.3) an ee P,(H2)’ 1 — a, i P,(H;)’ 





260 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


where P,(H,;) is the probability that hypothesis H; is accepted when hypothesis 
H, is true. By the argument used by Wald we have 

in . P. (H;) sup I(T 
4.4 nt I(T) = wt (T) -_ 2 7 < 1(T) = apt ) 
(4.4) e inf ¢ = PH = sup ¢ e 
the sup, and inf; being taken over all values of +(7) where the observation is 
stopped at time T with the decision to adopt H;. In our case we know that if 
the decision to accept H, is adopted at time T we must have +(7') 2 a, while 
+(t) < afort < T. Since (see (4.2)) #(¢ + 0) — 4(t) S log A2/Ay with proba- 
bility 1 we have from (4.3) and (4.4) 
(4.5) eg: 


a) os Ai 


Ae a 
ée. 


Similarly if at time 7’ we decide to terminate observation and adopt H,; we 
must have 4+(7’) < b and 4(t) > b for t < T. Since with probability 1 we have 
4(t) = 4(¢ — 0) we find that 4+(7) = b with probability 1. Therefore 


(4.6) amen a 
l - a 
We see here one of the advantages of continuous observation over observation 
at discrete times only. If we were treating the problem in the conventional manner 
we would have (4.6) replaced by an inequality, while only the first of the inequali- 
ties (4.5) could be derived in the above manner. 
Thus we have 


(4.7) b = log ; 


and 


1 — ado l — Mo 
-Sa Slog - 
a) ay 


(4.8) log» + log 


We now proceed to give a method for the exact computation of a. Without ad- 
ditional effort we shall also find the power function of the test. 
We put 


0) 


(4.9) R(t) = = x(t) — ct 


log — 
08 Y 


where c = (Ae — Ax)/log (A2/A1). Together with the process {x(t)} we have to 
consider also processes differing from it by a constant; that is, we consider proc- 
esses with arbitrary 2x(0). 

For given a and b, let V,(r) be the probability that the procedure described 
above will terminate with the adoption of H. when the Poisson parameter is 





SEQUENTIAL DECISION PROBLEMS 


really \ and R(O) = r. We then have 


(r — cAt (1 — Aat + o(At) 
R(At) = <r + 1 — cAt with probability At + o(At) 
\any other value 


o( At) 


\ 
where the o(At) terms are all smaller than \’A¢ for 0 < At < I/X. 
Putting 


(4.10) 


we have 
Vi(r) forr s 
Vi(r) for r 

while for K < r < J we have 

(4.11) Vy(r) = (1 — AAt)VA(r — cAt) + AAtV,(r + 1 — cAt) + o(At) 

with | o(At) | < d*(At)’ for 0 < At < 1/A. It follows at once that V,(r) is con- 

tinuous for K <r < J. (It will be discontinuous at r = J.) Rewriting (4.11) as 

Vi(r) — Valr — cAt) i 

At 


—AVi(r — cAt) + AVA(r + 1 — cdt) + = 


and letting At — 0 we see that V,(r) is differentiable in the interval K < r < J 


with the exception of the point r = J — 1 (in case K < J — 1). Thus we have 
the difference-differential equation 


(4.13) cVx(r) + AVa(r) = AVA(r + 1) 


for K <r< Jandr # J — 1. The unique solution in K < r < J is determined 
by the conditions: (i) V,(r) continuous for A < J, (ii) Va(K) = 0, (iii) Va(r) = 1 
forr = J. 

Let n(r) be the integer such that 


(4.14) J—-r-1lsn(r)<J-r. 


(4.12) 


It is easy to verify that, for A <r < J, 
n(r) i ‘ i 
(4.15) Vir) = 1+ Ce" DY 2 li -f= prem] 
i=0 ° 


satisfies (4.13) and (i) for every choice of the constant of integration C. To satisfy 
also (ii) one has merely to choose 


ms A/e)K ~~ —1)' ; ah tied 
(4.16) C= -¢ a a (J-K-i)=e ; 


1=1 





262 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


Putting r = 0 to represent the start of the actual probability ratio test as 
used in applications, we have from (4.15) and (4.16) that the “OC” function 
corresponding to the given values of \,, 42, @ and 6 is given by, say, 


> (~ 1) 2 as ne i 
(4.17) a(*) - 1 — won be a [ou i) | 


c ¥ (= vt % [u - x02] 
v! c 


=0 


(K is not displayed since it is given explicitly by (4.10).) Now J should be de- 
termined so that 


Each of the equations (4.18) follows from the other and either one may be used 
to find J. 


It should be noticed that the dependence of K and J on A, and dz is only through 


the ratio \2/A, . This follows from (4.10) and (4.17) and could also have been 
foreseen from the nature of the problem. This remark is useful in the numerical 
tabulation of the values of J and K or, equivalently, of a and b. (The fact that 
the A; are involved only through their ratio is due to the fact that they are not 
attached to a given time-unit. In the discrete parameter problem there is an 
absolute unit of time and hence the two \; enter as two parameters. The simpli- 
fication mentioned above therefore does not occur.) 

We now derive, in a manner similar to that used above, an expression for 
the moment generating function M,(u; r) = Ee" of the observation time 7 
necessary to reach a decision when R(O) = r and the true Poisson parameter is 
. From a result of C. Stein [2] it follows that for given J, K and J there is a 
positive number uw = w(J, K, \) such that M,(u; 7) is analytic and uniformly 
bounded in r for each complex u with real part smaller than mw. By definition 
we have M,(u;r) = 1 forr S K orr = J. In the same way as (4.11) was derived 
we obtain (for each u with real part smaller than w) for A < rs J-— 1 


M\(u;r) = (1 — AAt)Efe™'| RO) = 7, R(At) = r — cAt} 
+ rAtE{e“*| RO) = 1, R(At) = r + 1 — cAt} + o(At) 
(1 — AAde“*'Efe"~*” | R(At) = r — cAt} 
+ rAtE{e“’ | R(At) = r + 1 -- cAt} + o(Ad), 





SEQUENTIAL DECISION PROBLEMS 


or 
(4.19) My(u;r) = (1 — AAA + wdt)My\(u; r — cAt) 

+ AAtM,(u; r + 1 — cAt) + o(At). 
This form is also valid for J — 1 <r < J since M,(u;r + 1 — cAt) = 1 + o(1) 
forr > J — 1. Since the o(At) term and M),(u; r) are uniformly bounded in r we 
deduce, as in the case of V,(r), that, considered as a function of r, M,(u; r) is 


continuous for r < J, possesses a derivative for K <r <J andr # J — 1 and 
satisfies in the last range the equation 


(4.20) c= Miu; r) + (A — uw)M)(u; r) = AM,(u; r + 1). 


It can be verified that the solution of (4.20) satisfying the required boundary 
conditions is given for K < r < J by 


n(r)+1 
My (u; r) = I) 


n(r) i 
(4.21) + Clue eee a ( — {u- > ae i) to ‘| a Au 


i=0 (A — u)? 


6 A—u)(J—r—L/e oe (re —(A—u) ‘y > 6 <i lune a 2 _— 1) A= 2] 
=0 A—-u j=0 c 
with C(w) determined so that VW\(u; K) = 
Let Z,(r) be the expected length of time before a final decision is adopted. 
Then Z,(r) = 0/(du) M,(u; r)| uo. Since C(O) = O in (4.21) we obtain, on 
putting C’(0) = C 


n(r) ‘ 
Z(r) = n(r) ~ + ("eo or = Sot) |o- — rer ‘| 


=O 


n(r)—1 


= 1 wow +) SS aes gov |r athe. A] 


A 1=0 7=0 
for K < r < J (of course Z,(r) = 0 outside this range and C’ is determined so 
that Z,(K) = 
(One could derive (4.22) without using the moment generating function by 
establishing the equation 
cZy(r) + AZ(r) = 1 + AZA(r + 1) 


for KK <r<J,r#J — 1.) 
If we write in a more explicit manner Z,(r|,, 2) for Z,(r) with J and K 
determined as explained above, it is easily seen that 


(4.23) Zeal |ady, ads) = + Zr |r, ds) 
a 


for every positive a. 





264 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


It is possible to treat the negative binomial process in a manner essentially 
the same in which we have treated the Poisson process above. A complication 
is caused by the fact that the probability that the chance variable will exceed 
one in a small time interval is of the same order of magnitude as the probability 
that the chance variable will be one. 


The authors are obliged to Professor J. L. Doob for several helpful remarks. 


REFERENCES 
{1} J. L. Doos, Stochastic Processes, John Wiley and Sons, 1953. 


|2) C. Srern, ‘‘A note on cumulative sums,”’ Ann. Math. Stat., Vol. 17 (1946), pp. 498-499. 


|3] A. Waxp, “Sequential tests of statistical hypotheses, Ann. Math. Stat., Vol. 16 (1945), 
pp. 117-186. 


[4] A. Waxp, Sequential Analysis, John Wiley and Sons, 1947. 
[5] A. Wap anp J. Wo.rowrtz, ‘‘Optimum character of the sequential probability ratio 
test,’’ Ann. Math. Stat., Vol. 19 (1948), pp. 326-329. 


3] A. Wap anp J. Worrow!Tz, ‘‘Bayes solutions of sequential decision problems,’’ Ann. 
Math. Stat., Vol. 21 (1950), pp. 82-99. 


| J. Worrowirtz, ‘‘The efficiency of sequential estimates and Wald’s equation for se- 
quential processes,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 215-230. 





EQUIVALENT COMPARISONS OF EXPERIMENTS! 


By Davin BLACKWELL 
Stanford University and Howard University 


1. Summary. Sherman [8] and Stein [9] have shown that a method given by 
the author [1] for comparing two experiments is equivalent, for experiments with 
a finite number of outcomes, to the original method introduced by Bohnenblust, 
Shapley, and Sherman [4]. A new proof of this result is given, and the restriction 
to experiments with a finite number of outcomes is removed. A class of weaker 
comparisons—comparison in k-decision problems—is introduced, in three equiv- 
alent forms. For dichotomies, all methods are equivalent, and can be described 
in terms of errors of the first and second kinds. 

2. Introduction. An ordered collection a = (m,,---,m,) of probability 
measures on a Borel field ® of subsets of a space X will be called an experiment. 
Any pair (a, A), where A is a closed bounded convex subset of n-space corre- 
sponds to a decision problem as follows. A point z ¢ X is selected according to 
one of the distributions m, ; the statistician observes xz and then chooses an action 
d from a given set D, incurring a loss L(t, d). If we associate with d the vector 
w(d) = (L(\, d), --- , L(n, d)), the range of w(d) as d varies over D is the set A 
associated with the problem. Thus we may replace D by A, and suppose that the 
statistician chooses a point a = (a, --- , a,) € A, incurring loss a; when m, is 
the distribution of x. By using randomized decision procedures we increase A to 
its convex hull, and for simplicity we suppose A closed and bounded as well as 
convex. 

A decision function in the problem (a, A) is a ®-measvrable function f from 
X into A, specifying for each z the action a = f(x) to be taken when z is ob- 
served. When m; is the distribution of z, the expected loss from f is 


vif) = [ ace) dm,(x); the vector v(f) = (v:(f), --- , va(f)) is called the loss 


vector of f, and the range of v(f) as f varies over all decision functions in the prob- 
lem (a, A) will be denoted by B(a, A). The set B(a, A) will be a closed, bounded, 
convex subset of n-space [2]. 

For two experiments a, 8 with the same xn, following Bohnenblust, Shapley, 
and Sherman, we say that a is more informative than 8, written a D 8, if for every 
A we have B(a, A) D B(8, A), that is if every loss vector attainable in problem 
(8, A) is also attainable in (a, A). For any experiment a = (m,,--- , m,), let 
p(x) be the density of m, with respect to aa m; , let p(x) = [pi(z), --- , pa(x)], 
and let m, denote the distribution of p(x) when zx has distribution 2 7 m,/n. 
Then m, is a probability measure defined on the set P of all vectors 


Received 11/17/52. 
1 This paper was prepared under an Office of Naval Research Contract. 
265 





266 DAVID BLACKWELL 
Pp = (~pi,°+*, Px) With p; 2 O and b Bey p = 1, and 


(1) [ v.am. = 1/n; 


the center of gravity of m, is the point (1/n, --- , 1//n). The measure m, is called 
by Bohnenblust, Shapley, and Sherman the standard measure associated with 
the experiment a. Their basic results connecting m, and D are summarized as 
Theorem | below (for a proof see [1]}). 

THEOREM 1. Every probability measure on P with property (1) is the standard 
measure of some experiment; two experiments a and 8 have the same standard 
measure if and only if B(a, A) = B(B, A) for all A; a D B if and only if for every 


continuous convex function o(p) on P, [oam, = [eo dme . 


An alternative method of comparing two experiments a, 8, introduced by the 
author [1], can best be described in terms of the concept of stochastic trans- 
formation. If ®, @ are Borel fields of subsets of X, Y respectively, a stochastic 
transformation T is a function Q(z, E) defined for all x ¢ X and E ¢ @ which for 
tixed FE is a ®-measurable function of x and for fixed x is a probability measure 


on @. For any probability measure m on ®, the function M(E) = / Q(x, E) dm(x) 


is a probability measure on @, denoted by Tm. If X, Y are Borel sets in n-space 
and @®, @ are the Borel subsets of X, Y, T is called mean-preserving if 


| vad, y) = xforallz. 


If a = (m,---,m,) and 8 = (M,,---, M,) are two experiments, with 
m;, M,; defined on Borel fields ®, C of X, Y respectively, we shall say that a 
is sufficient for 8, written a > 8, if there is a stochastic transformation T with 
Tm; = M,fort = 1,---,n. Thus a@ > 8 means that the statistician, observing 
the result x of a, can, by selecting y according to Q(x, EF), obtain a result equiva- 
lent to the result of observing 8. 

The concept > also has a description in terms of standard measures, sum- 
marized in 

‘THEOREM 2. [1]. a > 8 if and only if there is a mean-preserving stochastic trans- 
formation T with Tmg = me. 

If a > 8 and @¢ is any continuous convex function on P, 


[eo dm, = [UV $(p) dQ(q, D) dms(q) 
[6 (/ pdQ(q, p)) dms(q) 
/ o dms, 


so that, from Theorem 1 we obtain 





COMPARISONS OF EXPERIMENTS 267 


THEOREM 3. a > 8 implies a D B. 

The converse of Theorem 3 has been proved, for experiments with a finite 
number of outcomes, by Sherman [8] and Stein [9]. In Section 3 we give a new 
proof of the Sherman-Stein theorem, and in Section 4 extend the theorem to 
arbitrary experiments. 

3. The Sherman-Stein Theorem. If the space X of outcomes of the experiment 
a is finite, consisting say of 2, +--+ , Zw, then ais characterized by then X NV 
Markov matrix P = || p;; ||, where p;; = m,(x;), and conversely every Markov 
matrix can be interpreted as an experiment. For two Markov matrices P, Q 
with the same n, we write P D Q, P > Q if the corresponding experiments are 
related by D, > respectively. 

TuHeoreM 4. Jf P, Q aren KX Ny, X Ne Markov matrices with P D Q, then 
for every No X n matrix D there is an Ny KX Ne Markov matrix M with 


Trace (PMD) sS Trace (QD). 


Proor. Let A be the convex hull of the rows of D. The decision function f 
in problem (Q, A) selecting the jth row of D when j is observed has v,(f) = 
> qijd;;, the ith diagonal element of QD. 

Since P D Q, there is a decision function g in problem (P, A), selecting say 
a;€ A when 7 is observed, with v,(g) = > pisaj: = vf) for all ¢. Since a; ¢ A, 
there are nonnegative numbers m with >>, m,, = 1 such that aj, = Do: made 
for all 7. Thus v,(g) = >o;%pi;mjds:, which is the ith diagonal element of 
PMD. It follows that M has not only the property asserted in the theorem but 
the stronger property that PMD and QD have identical diagonal elements. 

TuroreM 5. P > Q if and only if there is a Markov matrix M with PM = Q. 

This is simply a restatement of the definition of > for the special case 
X = (1,---,N,), Y = (1, --- , Ne), since a stochastic transformation becomes 
simply an N, X Nz Markov matrix. 

THEOREM 6. (Sherman-Stein theorem). P D Q implies P > Q. 

Proor. Consider the function h(D, M) = Trace (Q — PM)D, as M varies 
over all VN; X Ne Markov matrices and D varies over all V2 X n matrices with 
O < d; S 1 for all k, 7. Since h is bilinear and the ranges of D, M are closed, 
bounded, and convex, h has a saddle point [3], that is there exist Dy , My with 
h(Do, M) 2 h(Do, Mo) = h(D, Mo) for all D, M. From Theorem 4, there is an 
M with h(D, , M) S 0, so that h(D, My) <= 0 for all D. Writing U = Q — PM), 
we have 


Trace (UD) s 0 for all D, 


ik 


so that u,. < 0 for all 7, k. Since U is the difference of two Markov matrices, 
0, so that u,, = O for all 7, k and PM, = Q. Thus by Theorem 5, 


ik Wik 
P> Q. 

An alternative form of the Sherman-Stein theorem is 

THeoreM 7. If m, and m, are any two probability measures on a finite subset 
X of n-space such that for every continuous convex > defined on the convex hull of X, 





268 DAVID BLACKWELL 


/ odm, 2 / @ dm , then there is a mean-preserving stochastic transformation T 


with Tm. = m. 

From Theorems 1 and 2, Theorem 7 implies Theorem 6. Theorem 7 was proved 
for n = 1 by Hardy, Littlewood, and Polya [6], for n = 2 without the restriction 
that Y be finite by the author, and in the form given here by Sherman [8] and 
Stein [9]. 

Proor or THEOREM 7. From Theorems | and 2, Theorem 6 implies Theorem 7 
if XY C P and the common center of gravity of m; , m2 is (1/n, --- , 1/n), since 
in this case m;, m2 are the standard measures of experiments. Imbedding X in 
n + 1 space and performing an appropriate linear transformation reduces the 
general case in n-space to that of standard measures in n + 1 space and com- 
pletes the proof. 

A direct proof of Theorem 7, using the methods of Theorem 6 and not 
appealing to Theorems 1 and 2 can be given. 

4. Equivalence of > and >. In this section we extend Theorem 7, replacing 
the requirement that X be finite by the weaker requirement that X be bounded. 
For any two probability measures m, M on a bounded subset X of n-space, 
we write M > m if for every continuous convex ¢@ on the convex hull of X 


[< dM 2 [< dm and M > m if there is a mean-preserving stochastic trans- 


formation (abbreviated m.p.s.t.) T with Tm = M. We shall prove 

THeoreM 8. If M D m, then M > m. 

The method of proof consists of approximating m, M by measures concentrated 
on finite sets and using Doob’s martingale convergence theorems. We first 
prove 

A. There exist sequences of measures m, , M, each concentrated on a finite set, 
with my < myx, C mC M < My, < My for all N, and for every open set O 


my(O) > m(O), M,y(0O) — M(O) as N > ~. 


Proor. For any n-vector a = (a,,---, @,) with integral coordinates, let 

C(N, a) denote the cube consisting of all 4 = (t,,--- , ¢,) with 2-“a; S t; < 
2 *(a; + 1), let Z(N, a) be the center of gravity of m on C(N, a) and let my 
assign to Z(N, a) measure m(C(N, a)). It is easily verified that my has the re- 
quired properties. 
C(N, a) assigning to vertex 2-“(a; + «,--:, a, + €,), where e; = 0 or 1, 
measure b,b --- b, , where b; = 2-4; +1 — ¢; if ¢; = O and b; = t; — 2-“a, if 
«; = 1. The function Qy(t, FE) is a m.p.s.t. Uy , and if we define My = Un, 
we have also My = UyMy.;, so that My has the required properties. 

B. There exist sequences Ty , Vy , Wy of m.p.s.t. each from a finite set of n-space 
to a finite set of n-space with 


To define My, let Qyx(t, £) for te C(N, a) concentrate on the 2” vertices of 


(a) Mya = Tym Ss (b) My 1 = VyuMn > (c) M n= Waym N 5 





COMPARISONS OF EXPERIMENTS 


(d) Wy = VwuWwiTy. 


Proor. From A there exist sequences 7'y and Vy with properties (a) and (b). 
Also from A, my C My, so that, from Theorem 7, there is a m.p.s.t. Yy from a 
finite set to a finite set with My = Yymy. For D > N, write 


Yun = Vwar-:+ Vo¥oTv1-:: Tr, 
so that 


Ywo = VwsiYwat.oT for D> N +1, 


and 


My = Ywomy for D> N. 


Let D — « through a subsequence for which )'yp converges for all N, say to 
Wy. Then Wy satisfies (ec) and (d). 

Proor oF THroreM 8. We specify the joint distribution of two sequences 
M1, %2,°**, Yi, Ye, *** , Of n-dimensional chance variables by 

C. For any N, the variables 7, --- , ty, yw, °°: , ¥: form a Markov chain 
in the order written. The distribution of x; is m; and the conditional distributions 
of x44: given 21, yw given ry, and y,_, given y;, are specified by 7';, Wx, and 
V; respectively. 

Part (d) of B guarantees that the requirements C are consistent, and 
Kolmogorov’s extension theorem [7] then asserts the existence of 2, r2,---, 
Yi, Y2,°**, With property C. Parts (a), (b), (c) of B imply that ry, yw have 
distributions my , My respectively. Also the sequence 


wp Shy *"* 5 °** oc Bio 


forms a martingale [5] in the order written; by Doob’s martingale theorem [5], 
ty 2*, yy > y* as N > x, and E(y* | x*) = x*. From A, 2* and y* have dis- 
tributions m, M respectively, so that Q(z ,E) = Prob {y*2eF | 2* = xr} isam.ps.t. 
T with Tm = M. This completes the proof. 

5. k-decision problems. In this section we introduce a comparison somewhat 
weaker than >. The following lemma will be useful. 

Lemma. For any experiment a and any closed, bounded set C with conver hull A, 
B(a, A) = conver hull of Bla, C). 

Proor. Since both B(a, A) and B(a, C) are closed and B(a, A) is convex 
{2|, it suffices to show that every v(f) ¢ B(a, A) can be approximated by points 
in the convex hull of B(a, C). We may suppose that f assumes only a finite num- 
ber of values a,,---, @y, since every f can be approximated by f’s of 
this kind. Say 


S; = { f(x) = @;}, a; = a > 0, » Aji = 1. 


t=1 





270 DAVID BLACKWELL 


For any h - (hi, aor , hy), 1 shs r, define 


f(h) = cy, for xe S;, (h) = et Ajng- 


Then v(f(h)) ¢ B(a, C), and D>,A(h)v[f(h)] has for its sth coordinate 
2X v(h) 2» [sseredm, = 2d m4(Sj(Cnjs M(h))) 
x mS) (X cis (S_ xm) 
x m,(S;) © nie x a;,m,(S;) 


= s"" coordinate of v(f). 
This completes the proof. 

APPLICATION 1. Let a be any experiment, let 8 = (S, --- , Sx) be any partition of 
X into k disjoint ®-measurable sets, let P(S$) be the n X k Markov matrix with 
pis = m(S;), let @*, be the range of P(S), and let @ x be the set of alln X k Markov 
matrices P which have the property a > P. Then @ ax is the convex hull of es. . 

This is the special case of the lemma applied to the experiment a’ consisting 
of nk measures ./;; with M;; = m; for 7 = 1,+--, k and C consisting of the 
kn X k Markov matrices P;, --- , P, , where ?; has the jth column identically 
1 and the remaining columns identically zero. 

APPLICATION 2. For any experiment a and any closed bounded convex set A 
which is the convex hull of the set of k points d,,--- , da, , B(a, A) is the range of 
diag PD as P varies over @. , where diag U for any n X n matrix U = || uj; || 
denotes the n-vector (uy, , U22,°** , Unn) and D is the k XK n matrix whose rows 
ared,,---,a. 

If C consists of d,, --- , d. , and f is any decision function in (a, C), say S; = 
{f = d;}. Then the sth coordinate of v(f) is 

} i m(S5)d je , 


I 


so that 
v(f) = diag P(s)D. 


Thus B(a, C) = range of diag PD as P varies over **, . From the lemma, the 
convex hull of B(a, C) is B(a, A), and from Application (1) the convex hull of 
the range of diag PD as P varies over @%, is the range of PD as P varies over Pax. 

THEOREM 9. Let a, 8 be two experiments with the same n. The following condi- 
tions are equivalent: 


(1) Cu a Ppx 


(2) For every A which is the convex hull of a set of k points, Bla, A) D B(B, A). 
(3) For every convex function @ on n-space which is the maximum of k linear 
functions, fo¢odm,. = JSodmg. 
Proor. Suppose (1) and let ve B(8, A), where A is the convex hull of 
d,,---,d,.Thenv = diag PD for some P ¢ Pg . 





COMPARISONS OF EXPERIMENTS 271 


Since Pg. C Par, v = diag PD for some P ¢ @ and v ¢ B(a, A). Thus (1) 
implies (2). 

Now suppose (2) and let P ¢ ®s . Then for any closed bounded convex set R, 
let v ¢ B(P, R), say v = v(f), where f(j) = r;¢ R,j = 1, --- ,k. Thenve B(P, R*), 
where R* is the convex hull of r; , --- , 7, . Since B(P, R*) C B(8, R*) C Bla, R*), 
v ¢ B(a, R*) and consequently v ¢ B(a, R). Thus a D P for any P ¢ Og, and, by 
Theorem 8, a > P. Since @,, contains all n *& k Markov matrices P with a > P, 
P ¢ Pax and Pg, C Pax . Thus (2) implies (1). 

In considering (3), we use the fact that the standard measure mp of ann X k 
Markov matrix P is concentrated on k points, which follows immediately from 
the definition. Suppose (3), let ¢ be the maximum of any finite set £ of linear 
functions, and let P ¢ ®g, . There is a y, the maximum of k functions in £, which 
agrees with y on the k points on which m, is concentrated. Then f¢dm, = 
Sudm, = Sv dmg = Sy dmp = Je dm, , so that from Theorems 1 and 8, a > P. 
Thus P ¢ Pax , Pax CG Pax and (3) implies (1). 

Finally, suppose (1) and let @¢ = max (I,,--- , Lx); say 


U; = {L,(p) = o(p), Li(p) < o(p) fori < j}. 


If S; = {p(x) U;}, S = (S,,---+, S_) is a partition of X and the experiment 
P = P(S8) associated with 8 and § (see Application 1) has a standard measure 
mp with 


mp(U;) = mg(U5), 


[e dmg [e dmp. 


so that 


Since P ¢ ®g, , (1) implies P ¢ Pg. , so that fodm, = fodmp = So dmg. This 


completes the proof. 

If two experiments a, 8 with the same n satisfy any of the three equivalent con- 
ditions of Theorem 9, we shall say that a is more informative than B for k-decision 
problems, written a >, 8. Condition (2) is the direct analogue of D, and condi- 
tion (1) is analogous to >, since it requires that every experiment with / out- 
comes producible from 8 is also producible from @. Clearly >,;4; implies >, , 
and if a >; 8 for all k, then a > 8, since a >, 8 for all k implies fo dm, = fo dms 
for every ¢ which is the maximum of a finite number of linear functions and hence, 
by approximation, for every continuous convex ¢. An alternative statement is: 
if every experiment with a finite number of outcomes which is producible from 
B is also producible from a, then 8 is itself producible from a. 

Stein (unpublished paper) has shown that in general >; 4: is actually stronger 
than >, . For n = 2, however, all >, for k 2 2 are equivalent. 

THEoreM 10. Jf a and 8 are two experiments with n = 2, then a >» 8 implies 
a > &£. 

Proor. For n = 2, the standard measures m, and mg are defined on the line 
segment p; = 0, ~: + po = 1. On this line segment, every function ¢ which is the 





272 DAVID BLACKWELL 


maximum of a finite number of linear functions is representable as ax: , 
where a; > 0 and each ¢, is a maximum of two linear functions. Consequently 
a >» 8 implies a >, 8 for all k and hence a > 8. 
Corouuary. Let A be the line segment joining (0, 1) and (1, 0). If Bla, A) D 
B(8, A), then a > 8B. 
Proor. For any line segment A’ in the plane, there is a transformatfon 
zr =ar+bh 


L: 
y =cer+d 


with LA = A’. Since LB(a, A) = B(a, LA) and similarly for 8, we have B(a, A’) 
> B(g, A’), so that a >» 8 and consequently a > 8. 

For the A of the corollary, the boundary of the set B(a, A) consists of two 
curves, joining (0, 1) and (1, 0), one of which is the reflection of the other about 
(1/2, 1/2). Denote by f,.(¢) the minimum of u for which (¢t, uw) ¢ B(a, A). Thena > 8 
if and only if f.(t) S fs(t) for all ¢,0 < ¢ S 1. The function f,(t) is a nonincreasing 
convex function of ¢, representing the minimum attainable error of the second 
kind when the error of the first kind is fixed at t. Thus an alternative statement 
of the corollary is: 

a is more informative than 8 if and only if at every level t the error of the second 
kind wtth a is less than or equal to the corresponding error with 8. 

Since if a > 6, then an experiment with n independent observations with 
a is more informative than the corresponding experiment with 8 [1| we obtain 


TuHeorREM I1. /f for a sample of size 1 at every level t the probability of an error 
of the second kind with a does not exceed the corresponding probability for 8, then 
the same ts true for every sample size. 


REFERENCES 


{1] Davip BuackweE LL, “‘Comparison of experiments,’’ Proceedings of the Second Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, 1951, pp. 93-102. 

{2} Davin BrackweE Lu, “‘The range of certain vector integrals,’’ Proc. Amer. Math. Soc., 
Vol. 2 (1951), pp. 390-395. 

[3] H. F. BouNeENBLUsT, S. Karin, anv L. 8. SHapiey, ‘““Games with continuous convex 
payoff,’’ Contributions to the Theory of Games, Princeton University Press, 1950, 
pp. 181-192. 

(4] H. F. Bonnensuvst, L. S. SHaptey, anp S. SHERMAN, Unpublished paper. 

{5] J. L. Doon, ‘‘Continuous parameter martingales,’’ Proceedings of the Secondary Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, 1951, pp. 269-277. 

[6] G. H. Harpy, J. FE. Lirrtewoop, anp G. Potya, Inequalities, Cambridge University 
Press, 1934. 

[7] A. N. Kotmocorov, ‘‘Grundbegriffe der wahrscheinlichkeitsrechnung,’’ Ergebnisse der 
Mathematik, No. 3, 1933. 

[8] S. SHerman, “On a theorem of Hardy, Littlewood, Polya, and Blackwell,’’ Proc. Nat. 
Acad. Sci., U.S.A., Vol. 37 (1951), pp. 826-831. 

(9] Cuarves Stern, ‘‘Notes on the Comparison of Experiments,’’ (mimeographed), Uni- 
versity of Chicago, 1951. 





TESTING ONE SIMPLE HYPOTHESIS AGAINST ANOTHER 


By LioneLt WEIss 
University of Virginia 
1. Summary and introduction. For the problem of testing one simple hypothe- 

sis against another, of all tests whose probabilities of incorrectly accepting the 
first hypothesis and of incorrectly accepting the second hypothesis are bounded 
from above by given bounds, the familiar Wald sequential probability ratio test 
gives the smallest expectation of sample size under either hypothesis. In this 
paper, a “generalized sequential probability ratio test’? is introduced which 
differs from the Wald test only in that the same limits (A, B in the usual nota- 
tion) are not necessarily used at each stage of the sampling, but at the ith stage 
A, and B, are used, where these numbers are predetermined constants. It is 
shown that for any given test 7’, there is a generalized sequential probability 
ratio test G whose probabilities of incorrectly accepting either hypothesis are no 
larger than the corresponding probabilities for JT, and such that the cumulative 
distribution function of the number of observations required to come to a 
decision when using G is never below the corresponding distribution function 
when using 7’, under either hypothesis. We may then say that “G is uniformly 
better than T.”’ 


2. Assumptions and notation. In this paper we deal with the problem of test- 
ing one simple hypothesis H,; against another simple hypothesis H; . We assume 
that under H; the chance variable X has a distribution with density function 
f(x). Both fi(x) and fo(x) are everywhere bounded and have at most a finite 
number of discontinuities. We make the test by means of a sequence of inde- 
pendent chance variables (X,, X., --- ), each having the same distribution 
(the density function of each is f,(x) under H;). We assume that for any n and 
any finite nonzero c, 


i] iia [fs dx; + 0 as Ac— 0, for i= 1, 2, 
j=1 


the region of integration being 


es IT A(z) Sct Ac. 
IT Ate, 


The only tests we shall consider are those not involving randomization, and 
such that in the space of the first n chance variables the regions where H, is 
accepted and H; is accepted are Borel sets, for any n. 

We define a ‘“‘generalized sequential probability ratio test’”’ as follows. There 


Received 7/3/52. 
273 





274 LIONEL WEISS 


are two sequences of predetermined nonnegative constants (A;, A:, --- ) and 
(B,, Bs, ---) such that A, = B; for all 7. The value ~ is not excluded. As long as 


a, < I px) TI fx, a 


we continue sampling. The first time that this does not happen, we accept H2 
if the upper bound was violated, accept H, if the lower bound was violated. 
If A,, = B,,, while for all i < m we have A; > B,, the test is truncated at the 
mth step. In general, any test is said to be truncated at the mth step if the 
probability of continuing sampling beyond the mth observation is zero under 
either hypothesis when using the test. 

We use the following notation. 

T:D,(n) is the probability that the sample size required to come to a decision 
is less than or equal to n, when the test 7’ is used and H; is true. 

T.A,(n) is the region in the space of (X,, X2, --- , X,) where we accept H; 
when the test T is used. To be in this region, we must have taken an nth observa- 
tion. 

T:A, is the region in the »-dimensional space where we accept H; when 
using the test T. 

T:C(n) is the region in the space of (X,, X2, --- , X,) where we continue 
sampling when using the test 7’. 

P(R) is the probability of falling in any region R when H; is true. 

P(R | S) is the conditional probability of falling in R, given that we are in S, 
when H; is true. 

Q(X, n) is [[h-ife(x,) /[ 1 fi(z;). In specifying that we are using a certain 
test T, we shall not keep repeating the symbol ‘‘7:” throughout an expression, 
but shall use it once at the beginning of the expression and understand that it 
modifies everything coming after it, until we reach a symbol denoting another 
test. Thus if T and T’ are two tests, the inequality 7’: P,(Ai(m)) + P;(A2(m)) > 
T’: P\(A,) means that the probability of coming to a decision at the mth step 
when using 7’ and H, is true exceeds the probability of accepting H, when it 
is true and 7” is used. 


3. Existence of a sequence of generalized sequential probability ratio tests 
uniformly better than a given test in the limit. 
THeoreM 1. /f T is any test of H, against Hz such that 
lim T:D,(n) = 1 fori 
no 
there is a sequence (G,, Gz, --- ) of generalized sequential probability ratio tests 
such that 


G;:D(n) = T: Din) for all n, all j, and i = 1, 2; and 


lim G,:P(A,) = T:P(A,) fori = 1,2 


jx 





SIMPLE HYPOTHESIS 275 


Proor. (At certain points in the proof, our statements should really be modified 
for certain sets of probability zero under both H; and H, . The fact that we have 
neglected to do this in no way affects the proof.) To prove the theorem, we form 
a sequence of tests (T;, T:--- ) as follows. T; coincides with T until the jth 
observation. If a jth observation is taken, 7; says accept H2 if Q(X, 7) 2 1, 
else accept H, . Then we have 


T;:D(n) = T:D(n) for all n, all j, and i = 1, 2; and 
lim 1;:P(A) = T:P(A) for i= 1,2 


j-72 


Now for each j, we will replace 7; by a generalized sequential probability ratio 
test G; , such that 


(4.1) G;:D(n) = T;:D,(n) for all n, all 7, and z = 1, 2; and 
(4.2) G;:P(A;) = T;:Pi(A,) for all j and fori = 1, 2 


am 


This, with an obvious application of the Bolzano-Weierstrass theorem, will 
complete the proof of our theorem. Whenever two tests T and 7” stand in the 
same relation to each other as do G; and 7; in (4.1) and (4.2) we shall write 
T*D*T’. Thus we can state (4.1) and (4.2) more concisely as G4,D*7'; for all 7. 

Take any integer j and hold it fixed. Let us assume that for some integer m 
above 1 but not exceeding 7 we know that for any given test 7'; truncated at the 
jth step there is a test 7;(m), also truncated at the jth step and coinciding with 


7, before the mth observation, such that 7'j(m)*D*T; , and also 7)(m) has the 
property W(m) defined as follows: 


T;(m):A,(n) is given by Q(X, n) S Bi; 


and 


Tj(m): As(n) is given by Q(X, n) = Ab, 


for all n between m and 7 inclusive. We shall then show that all this is true for 
m — 1. Since it is certainly true for m = j (with Aj = Bj, since a decision must 
be reached by the jth step), by working back to m = 1 we will obtain G; and 
thus complete the proof of the theorem. 

If in the space of the first m — 1 chance variables we consider only those 
points for which we stop sampling at the (m — 1)*t observation, we can always 
transfer points so that in 7;:A,(m — 1) we have Q(X, m — 1) S ce, while in 
Tj: Asim — 1) we have Q(X, m — 1) > c, for some nonnegative c, without 
making the distribution of the sample size or the probability of accepting a true 
hypothesis less favorable in any respect. This simply requires the application 
of the Neyman-Pearson lemma to the set of points under consideration. We shall 
assume that this is done. Suppose that we then find that there is a number r 
such that the subset S, of T;:C(m — 1) where Q(X, m — 1) > r and the subset 
S,, of T;:A.(m — 1) where Q(X, m — 1) S r are both nonempty. We assume 





276 LIONEL WEISS 


that P,(S;) > 0, else we would incorporate S; into T;:A2(m — 1), which could 
not make the situation less favorable. Similarly, we may assume P,(S;;) > 0, 
and hence P;(S,;;) > 0, else we incorporate S,; into T;:A,(m — 1). With these 
assumptions, c < r < o. Then we can find a number R, with ec < R < «a, 
such that if s; is the subset of T;:C(m — 1) where Q(X, m — 1) > R, and s;, is 
the subset of 7;:A2(m — 1) where Q(X, m — 1) S R, we have P,(s;) = 
P,(s,;;) > 0. It is clear that s,; and s,,; are Borel sets. From now on, when we 
write X we shall understand the generic point (x, 22, °°: , Xm-1) of m — 1 
dimensional space. To each point X,, of s;; we assign a nonnegative number 
r(X,,) as follows: r(X,,;) is the greatest lower bound of the set of numbers v such 
that P,[X in s, and Q(X, m — 1) S v] 2 PX in s,, and Q(X, m — 1) S$ 
Q(X1,,m — 1)). 

Now let us assume that we are using a test 73(m) defined above, where the 
acceptance and continuation regions from the first to the (m — 1)*t observation 
are given by 7;. We modify this test into 7(m) as follows. Transfer s,; into 
T; (m):A2(m — 1), sy; into T7(m):C(m — 1), and for any X in s;;, we act in 
the future as though Q(X, m — 1) were equal to r(X). In all other respects, 
T;(m) coincides with T5(m). We shall show that 77 (m)*D*T%(m). Note that 
T; (m) does not in general have the property W(m). 

We need the following lemma. For any given u, 

P,|X in s; and Q(X, m — 1) S uj = Pi[X in 8;, and r(X) S ul. 


It clearly suffices to prove the lemma for u between g.l.b. Q(X, m — 1) for 


X in s,; and l.u.b. Q(X, m — 1) for X in s,;. The proof of the lemma is given in 
five short sections. 
(1) Given any point X’ in s,;, , we have from the definition of r(X’): 


P,[X in s; and Q(X, m — 1) S r(X’)] 


= P,[X in s;; and Q(X, m — 1) = Q(X’, m — 1)}. 


(2) P,{X in s;; and r(X) = uj = O. For suppose r(X’) = r(X”) = u, and 
Q(X’, m — 1) < Q(X”, m — 1). Then we must have P,[X in s,;; and 
Q(X’, m — 1) S Q(X, m — 1) S Q(X”, m — 1)] = O, else we could not have 
r(X’) = r(X”), by (1). We define 


Q; = g.l.b. Q(X, m — 1) for X in s,, and r(X) = u, 
Q2 = l.u.b. Q(X, m — 1) for X in s;; and r(X) = u. 


Then P,[X in s,; and Q; S Q(X, m — 1) S Q] 2 P,{X in s,, and r(X) = wu). 
Since P;[(Q(X, m — 1) S c] is a continuous function of c for c in the open interval 
(0, ©), and we can clearly assume that we accept H2 as soon as Q(X, n) = ~ 
and accept H; as soon as Q(X, n) = 0, we have P,{X in s,; and Q S 
Q(X, m — 1) S Q:] = 0, which proves the first sentence of (2). 

(3) r(X) is a nondecreasing function of Q(X, m — 1), and therefore if X’ 
is in s;; the set of points in s;; such that Q(X, m — 1) S Q(X’, m — 1) is the 





SIMPLE HYPOTHESIS 277 


set of points in s;; such that r(X) S r(X’) (ignoring sets of probability zero 
under H;). 


(4) If there is a point X’ in s,; such that r(X’) = u, we have P,[{X in s; and 
Q(X,m — 1) S uj = P,[X in s;, and r(X) S ul], by (1) and (3). By continuity, 
the same thing is true if there is a sequence of points (X,, X2, --- ) in 8,; such 
that lim r(X,) = wu. 


(5) For any u not of the type discussed in (4), we define B(u) = 1.u.b. r(X) 
for all X with r(X) < wu, b(u) = g.l.b. r(X) for all X with r(X) > u. We have 


P,{X in s; and Q(X, m — 1) S B(u)] = P,{X in s,, and r(X) <= B(u)], 
and 
P,[X in s, and Q(X, m — 1) S b(u)] P,[X in s,;; and r(X) S b(u)}. 
But 
P,[X in s;; and r(X) S B(u)} = P,[X in s;; and r(X) S b(u)], 


and therefore our lemma is proved. 

First we examine what has occurred when H, is true. Since 77 (m) and 7T;(m) 
coincide until the (m — 1)*t observation is taken, we start our investigation at 
the (m — 1)*t observation. Also, 77(m):A,(m — 1) is the same set as 
T’(m):A\(m — 1). Since P,(s;) = Pi(s;;), we have T; (m):Py(Ao(m —1))= 
T’(m):P\(Ae(m — 1)). Now choose any number k between m and j inclusive. 


We shall show that. 77 (m):P\(A,(k)) = T;(m):P,(A,(k)), ¢ = 1, 2. This will 
complete the proof that 77 (m)*D*Tj(m) when H, is true. For any set S, we 
denote the complement by S. We have 


T” (m):P\(A,(k)) 

= P,(8.:-C(m — 1))Pi(A.k) | 8n-C(m — 1)) + Pilsn-C(m — 1)-A,(k)), 
and 
T)(m):P,(A(k)) 

= P,(8,-C(m — 1))Pi(A,(k) | 3,-C(m — 1)) + Pils:-C(m — 1)-A,(k)). 


But corresponding terms in the expressions on the right of the two equations 
are equal to each other, therefore the two left sides are equal. The only terms 
on the right for which equality is not obvious are Tj(m):P,(s;-C(m — 1)-A,(k)) 
and T”(m):P,(sr1-C(m — 1)-A,(k)). These are equal to T3(m):P,(A ,(k)-s,) and 
T”’ (m):P,(A (k)-s;:) respectively. To show that these two latter expressions 
are equal to each other, we define Y,[u, A:(k), V] to be the probability when Hy; 
is true of falling in V:A,(k) when we arbitrarily assume that Q(X, m — 1) = u 
and then start sampling, using the test V as though the first observation were 
the mth, the second were the (m + 1)*, ete. Clearly, ¥i{u, A,(k), 7'j(m)] and 
¥,{u, A,(k), 17 (m)] are equal to each other for all u in the open interval (0, ~), 
and are continuous on this interval. We also define Fi(u, s;) as P,[X in s; and 





278 LIONEL WEISS 


Q(X, m — 1) S wu), and G,(u, s;;) as Pi{X in s;; and r(X) S uj]. We know from 
the lemma that F,(u, s;) = G,(u, 8;;) identically in u. We can assume that 
T;(m) is such that the sets [X in s; and Q(X, m — 1) = O] and [X in s, and 
Q(X, m — 1) = @] are empty. Then F;(u, s;) is continuous at 0 and «. We 
have 


T3(m):PXAQ(t)-2) = | ” ¥ifu, Ak), T’(m)] dF y(u, 8), 


Tj (m):P\(AAk)-8y) = i Y,[u, Ak), T7(m)] dG,(u, 81). 


From the considerations above, we know that these Stieltjes integrals exist and 
are equal to each other. Thus we have shown that 7} (m):P,(A,(k)) is equal to 
T;(m):P,(A,(k)) for i = 1, 2 and for any k between m and j inclusive. 

Now we examine the situation when H; is true. Once again, we can start our 
investigation at the (m — 1)* observation. We have 7)(m):P.(C(m — 1)) = 
P2(s;) + P2(8:;-C(m — 1)), and Tj(m):P(C(m — 1)) = P2ls1) + 
P.(8;;-C(m — 1)). But the second terms on the right of these two equalities are 
equal, while P2(s;) > P2(s;;), since Pi(s;) = P1(s;;), and in s;, Q(X,m — 1) > R, 
while in s;; , Q(X, m — 1) S R. Now we take any k between m and j inclusive, 
and examine the expressions 


T j(m):P2(C(k)) 

= P2(s;-C(k)) + P2(8,;-C(m — 1))P2(C(k) | &:-C(m — 1)), 
7’; (m): P2(C(k)) 

= P2(8,;-C(k)) + Pe(81-C(m — 1))P2(C(k) | §:2-C(m — 1)). 


The second terms on the right of these two equalities are equal to each other. 
We investigate the first terms on the right. In a notation that will be recognized 
by analogy with that already used, we have 


T'(m):Pa(si-C(k)) = | Yalu, CC), Ti(m)] aFs(u, 81), 
0 


= 


T; (m):P2(sy-C(k)) = Y.[u, C(k), T7(m)] dG.(u, sx). 
“0 


But dF.2(u, s;) > dG2(u, s;,;) for all u, because dF\(u, s;) = dG,(u, s7,), while 
dF,(u, s;) > RdF,(u, s;) and dG,(u, s,;) < RdG,(u, s;;) for all u. Also, 
Y.{u, C(k), Tj(m)] = Ya{u, C(k), T7(m)] for all u. Therefore we find that 
Tj(m):P2(C(k)) > T7 (m):P2(C(k)). To complete the proof that 77 (m)*D*T}(m), 
we have to show that T; (m): P2(Ag) => T3(m):P2(A2), or that 


T 5 (m): Po(81-811° Ae) + P2(s1) + P2(8r1-A2) 





SIMPLE HYPOTHESIS 279 


> T}(m):P2(81+811°A2) + Po(8r1) + Po(sr- Az) 


or, since the first terms on the two sides of the inequality are equal, that P2(s,;) — 
Tj(m):P2(s,;-A2) = P2(81)) — T3(m):P2(81,- Az), or 


| dF .(u, 8;) _ | Y2[u, Ag , T5(m)) dF ,(u, 8;) 


> | dG(u, 81) — | Yalu, Ae, T7(m)] dGx(u, 81), 


or that 


o .@ 


[ (1 — Yofu, Az, Tj(m)]) dF,(u, s;) = (1 — Y.{u, Ar, T7(m)]) dG2(u, 8), 
and this last inequality is immediately seen to hold. 

By the assumption made above, there is a test T j(m) coinciding with 75 (m) 
before the mth observation, having the property W(m), and such that 
i j(m)*D*T% (m). Now we transfer points between T (m):A,(m — 1) and 
7',(m):C(m — 1) so that after the transfer, for any X left in T,(m):A,(m — 1) 
we have Q(X, m — 1) S S, and for any X left in T(m):C(m — 1) we have 
Q(X,m — 1) > S, where 0 < S < c. Then, when S is properly chosen, we can 
show exactly as above that we can define a test T4(m), coinciding with 7’ ;(m) 
before the mth observation, such that 7j(m)*D*T ;(m). Using the assumption 
made above again, there is a test 7,(m) having the property W(m), coinciding 
with T(m) before the mth observation, and such that 7 j(m)*D*T)(m). But 
then we have 7 ,(m)*D*Tj(m), and also T,(m):A,(m — 1) is of the form 
Q(X, m — 1) S S, T;(m):A.(m — 1) is of the form Q(X, m — 1) = R, and 
T (m):C(m — 1) is of the form S < Q(X, m — 1) < R. Thus the existence of 
T (m) shows that if our assumption holds starting from the mth observation, 
it also holds starting from the (m — 1)**. Since it holds at the jth observation, 
the theorem is proved. (Note that we were able to carry out the proof no matter 
what the acceptance and continuation regions were before the (m -- ')* ob- 
servation). 


4. Existence of a generalized sequential probability ratio test uniformly | etter 
than a given test. 

TreoreM 2. If T is any test of H, against Hz satisfying the assumpti 5 of 
Sections 2 and 3, then there is a generalized sequential probability ratio test ( such 
that G*D*T. 

Proor. We start with the sequence of generalized sequential probability 
ratio tests (G, , G2, +-- ) of Theorem 1. From this sequence we can choose a 
subsequence so that the sequence of A; associated with the subsequence of tests 
converges (convergence to * is allowed throughout this proof). From this sub- 
sequence of tests we choose a second subsequence so that the associated sequence 
of B, converges. From this second subsequence of tests we choose a third sub- 





280 LIONEL WEISS 


sequence so that the associated sequence of A, converges. We continue this 
way in an obvious manner. Then we form a new sequence of tests consisting of 
the first test in the first subsequence, the second test in the second subsequence, 
--+, the ith test in the ith subsequence, --- . Denote this new sequence by 
(S,;, S:,--- ). Define G to be the generalized sequential probability ratio test 
given by the two sequences of bounds (AT, A?, --- ), (BI, B2, --- ), where 
Af = lim,.. (A; associated with S,;), BT = lim;.. (B; associated with S,). 
By our construction, these limits exist. To see that G*D*T, it suffices to note 
that S;:D,(n) = T:D,(n) for all n, all 7, and i = 1, 2; and lim, S,;:P,(Ai) 2 
T:P,(A;) for 7 = 1, 2; and also that for any generalized sequential probability 
ratio test, the probabilities of falling in the various acceptance and continuation 
regions under either hypothesis are continuous functions of the associated 
bounds in the two sequences which characterize the test. (Note that any general- 
ized sequential probability ratio test accepts H, as soon as Q(X, n) becomes 
zero, accepts Hz as soon as Q(X, n) becomes infinite). 


5. Relation of results to decision theory. The relation of the results of this 
paper to general decision theory is fairly clear. In decision theory we are given 
a loss function, which we shall assume depends only on the true hypothesis, the 
hypothesis chosen as correct, and the number of observations required to come 
to a decision. We shall write this loss function as W(H, D, N), where H is the 
true situation and can equal either 1 or 2, D is the decision as to which hy- 
pothesis is correct and can also equal either 1 or 2, and N is the number of 


observations required to come to a decision. We also make the following reason- 
able assumptions about the loss function: W(1, 2, NV) = W(1, 1, N) for all N, 
W(2, 1, N) = W(2, 2, N) for all N, and W(i, 7, NV) is nondecreasing in N for 
any fixed 7 and j. Then the discussion of the previous sections shows that if T 
is any test, there is a generalized sequential probability ratio test G such that 


G:P(W(i, D, N) Ss w) 2 T:P(Wa, D,N) Ss w) for all w and fori = 1 


» 

6. Concluding remarks. The restriction to tests not using randomization that 
we made above is not necessary. For suppose R is any test, with or without 
randomization, such that lim,.... R:D(n) = 1 for i = 1, 2. Truncating R at 
the mth observation in the usual way, we get a test R(m) such that 


R(j):Ddn) 2 R: Dn) for all n, all 7, andz = 1, 2; and 
lim R(j):P(A,) = R: P(A, fori = ], 2. 


jo 
Theorem 5.1 of [1] tells us that there exists a nonrandomized test T(j) such 
that 7(j):P.(A,(n)) = R(j):P.(A(n)) for all n and for i = 1, 2, k = 1, 2. 
From this, it is easy to see that Theorems 1 and 2 hold if we consider randomized 
tests. 

Also, the restriction that the density functions be bounded can be dropped, 
and the results still hold. 





SIMPLE HYPOTHESIS 281 


Finally, similar results hold in those cases where the observations are not 
taken one at a time, but in groups of predetermined size. 


7. Acknowledgment. The author wishes to thank Dr. Milton Sobel for several 
suggestions which made this paper more readable. 


REFERENCE 


(1) A. Dvorerzxy, A. WaLp, anv J. Wo.rowi1T7z, ‘“‘Elimination of randomization in certain 


statistical decision procedures and zero-sum two-person games,” Ann. Math. 
Stat., Vol. 22 (1951), pp. 1-21. 





ON THE EXACT EVALUATION OF THE VARIANCES AND COVARIANCES 
OF ORDER STATISTICS IN SAMPLES FROM THE EXTREME-VALUE 
DISTRIBUTION’ 


By Junius LiIeBLeEIn 


National Bureau of Standards 


Summary. This paper develops explicit closed formulas for the covariances of 
order statistics in samples from the extreme-value distribution which involve 
only tabulated functions. Such results do not appear to have been given pre- 
viously. They have been used in an investigation of the estimation of extreme- 
value parameters by means of order statistics which will be presented in a fuller 
report to be submitted to the National Advisory Committee for Aeronautics. 


1. Problem. We are concerned with random samples of size n from the “ex- 
treme-value” distribution whose cdf is 


(1.1) F(x) = exp(—e ”), po, —o <x < ow, 
(This distribution was derived as a limiting form of the distribution of the 
largest value in a sample by Fisher and Tippett [1] and has been extensively 
studied by Gumbel (e.g. [4], [5]). However, this paper is not concerned with the 
extremal properties of this distribution.) If the n values after ordering in size 
are denoted by 


Uy, He, °** In, vy X2 . ~”* Ins, 


then we seek the second-order moments of the x;, +;, namely, the variances 
a; and covariances o;;. The first moments have been tabulated [6] for samples 
of n S 100. 

The second moments involve integrals which at first sight look more difficult 
than the corresponding ones for the normal distribution, which latter have 
required a very extensive amount of numerical integration. In this paper a 
nethod is shown for evaluating the extreme-value integrals in closed form 
((3.10) below) involving only tabulated functions. Thus, the extreme-value 
distribution is brought into the select circle, which previously included only the 
normal (at least for n < 6—see [3]), exponential, and rectangular distributions, 
and perhaps some others, for which the second moments of the order statistics 
can be evaluated explicitly without quadratures. 


2. Theory. The density function of the ith order statistic, x; , from the distribu- 


Received 12/4/52. 


1 This paper is based on research sponsored by the National Advisory Committee for 
Aeronautics. 


282 





EXTREME-VALUE DISTRIBUTION 


tion (1.1) is 


n! 


(2.1) p(x) = ea Din — 


i)! [F(@))""ll — F(@)"@), -« <2<@, 


where z = 2;, f(x) = F’(x). The joint d.f. of the ith and jth order statistics 
oe 


n! i-1 7 ;~i-1 
(22) plz, y) = — NIGrI= Din ji FF [F(y) — F(2)] 

‘1 — FY) F@)yfy), -2 <rsy<-», 
where « = 21, y = 4},1 <j,i,j = 1,2, --- , n. Without loss of generality, we 
shall henceforth refer only to the standardized or “reduced” extreme-value 
distribution, with the parameters 8 = 1, u = 0, 

(2.3) P(y) = exp(—e”), —-x <y< ~~, 
and denote its variable by y. 
From the density functions (2.1) and (2.2) we obtain 


T(y*k) = mt be ek —(i-1)en2 a7 8\ 0-6 2-0-8 
Ey) (t — 1)'(n — 2)! [we (1 e~ )"e dx 
(2.4) ' ee ‘ 
- 7)! Zz (-yer~* [ oor ae 


~ G—)in—-1)! 


_ n!} 
EQ.) = G—HiG-i— Dia — pt 


. . 7 ie~2, —e~v —e~2y j—i-— —e~¥\n—j 
. / / aye” * *e* “(e -¢**)F "1 — &* ")"? dx dy 





n! 
(@—1)"j —t—D)(n —75)! 


j—i—1 n—-j 


D> Dw (-1)"cr cro + 7,7 — i — 1 + 8), 


r=0 s=( 


where the function ¢ is the double integral 


. “2 —y—ue v 
(2.6) ¢(t,u) = [ [ aye” “ “e* dz dy, ttu>o 


whose evaluation is the main point of this paper. 


3. Evaluation of the integrals. 
3.1 Variance-type integrals. These integrals are of the general form 


g(c) = [ ze *** dz, c>0. 


The evaluation given here is not new, but is presented for completeness. 





284 JULIUS LIEBLEIN 


=z 


The change of variable e * = v gives 


g(c) = | (—log v)‘e~” dv, 


which for k a nonnegative integer 


(—1)* a . ve dv 
dt* Jo 
(3.1) ? 

(—1) ate [T(t)e‘] 


The needed first two values are 


(3.2) = -|f (1) ~ log | = : (y + log c), 


c 


where y = —I’(1) is Euler’s constant, .5772156649 --- . Likewise, 


(3.3) g.(c) = : E + (vy + log 0] : 


3.2 Covariance integrals. An integration by parts applied to the inner integral 
in (2.6) with ‘dv’ equal to the expenential factor gives 


- “Zz —te~yV _ e-2 
[ ze” “dr =t"' ye —t ‘| ee * dz. 
Hence from (2.6) and (3.1), 
(3.4) ip(t, u) = g(t + u) — Y(t, u), 


where 


(3.5) (t,x) = [ wer | r. o- az | dy. 


The function ¥ regarded as a simple integral containing a parameter (¢ > 0) 
may be differentiated under the integral sign, giving, by (3.2), 


(3.6) v = Fonte +u)=- Woe ly + log (¢ + u)], t,u > 0. 
Before integrating this equation, it is convenient to make the change of variable 
w = 1 + (t/u). After the substitution integrate (3.6) with respect to w from 
w = 2 tow = w, and replace the upper limit w in the resulting expression by 
its value in terms of ¢, noting that the corresponding limits for ¢ are t = u to 
t = t. The result is 


uly(t, u) — yu, u)] = y log (1 + w/t) + 4flog(t + u)]° 
(3.7) 


log w 


l+t/u 
— log u log t/u — y log 2 — }(log 2u)* — [ a. dw. 





EXTREME-VALUE DISTRIBUTION 285 


The integral on the right is immediately expressible in terms of Spence’s integral 
(or function) 


(3.8) LQ +2) = pr ne ae oes 


n=1 


a 


Several tables of this function are cited in [2]. The most extensive of these is 
given by F. W. Newman [7] to twelve decimai places. 
It remains only to evaluate ¥(u, wu). From (2.6) and (3.2), 


oo v 
¢(u,u) = [ = (/ — dz) dy 
° v = 2 
‘ [ 1 a (f xe“ ac) dy 
1 2 
= 5— (7 + log u)”. 
2u 
This value when substituted in (3.4) gives ¥(u, wu). Combining this result with 


(3.7) and the easily obtainable value L(2) = 2°/12 gives, after a little algebra, 
the following formula: 


2tu g(t, u) = (u — tlt + u) + lg)? + 2L (1 + ‘) 


( s) 2 
“— ¢ 


where the functions g,(¢), go(t) are given by (3.2), (3.3). This may be simplified 
a little by use of the following property of Spence’s function: 


(3.9) 


LA+2)+ L (1 +1) = (log 2)* + ™, 


giving the result 


(3.10) 2tu o(t, u) = (u — Daalt + u) + Cla!’ — 21. (1 + ) + . 


The above results (3.9), (3.10), together with (2.4), (2.5), make possible the 
evaluation of all the variances and covariances. This requires the calculation of 
n values of g; and of g, , and n(n — 1) values of ¢. 

The calculation may be simplified with the aid of the relation 


(3.11) g(t, u) + o(u, t) = gilt)gi(u), 


which may be derived from (2.6) and (3.2) by means of a change in the order of 
integration. Thus (3.10) need be used only for ¢ S u, so that (3.11) reduces the 
number of values of ¢ by almost half, unless n is small, say n < 10. 


4. Illustration. The above formulas have been used by the author in an inves- 
tigation of estimation of extreme-value parameters by means of order statistics. 





286 JULIUS LIEBLEIN 


The results of this research, including a table of the first two moments for small 
samples, will be reported elsewhere. 

The following computations for n = 3 illustrate the procedure described in 
this article. 

From (3.2), 


= 0.57721 57 
0. 


63518 14 
0.55860 93. 
The means are then given by (2.4) and (3.2) as 
E(y:) = 3ig(1) — 2g:(2) + gi(3)] = —0.40361 4 
E(y2) = 6[g9:(2) — gi(3)] +0.45943 3 
E(ys) = 3g:(3) + 1.67582 8. 


As a simple check, these three values sum to 3y to within six decimal places, 
and also agree with those in [6]. (The notation in the table cited differs from 
that used here: F(y;) in this paper corresponds to E(y,_,) in the table.) Next, 
f rom{( 3.3), we have 


g:(1) = ~ + y* = 1.97811 2 


g:(2) = 1.62937 8 
g:(3) = 1.48444 4. 
The mixed function ¢(¢, u) is then given by (3.9) and (3.10): 
o(1, 1) = }(7’) = 0.16658 9 
(1, 2) = 3{go(3) + 7° + 2L(14) — (1n2)* — 2°/6] = 0.14726 6 
o(2. 1) = 4{—ge(3) + 4[gi(2))? — 2L(14) + 27/6} = 0.21937 1. 


Newman’s table [7] provides the value of the function L(1}) = 0.44841 42069. 
Finally, equations (2.4) and (2.5) give, for the moments about the origin, 


0.61140 0.11594 —0.43263 
|| E(ysy;) || = 0.11594 0.86960 1.31622 
—0.43263 1.31622 4.45333 
whence the moments about the mean are given by 
0.44850 0.30137 0.24376 
|| o(ysy;) || =] 0.30137 0.65852 0.54629 
0.24376 0.54629 1.64493 |. 





EXTREME-VALUE DISTRIBUTION 287 


The final results are correct to about four decimal places. (One additional place 
is shown for checking purposes.) 
As a check, we should have 


° 


3 3 3 : 
a dX o(yiy;) = o (= ») = 90°(g) = 30; = = . 
j=l i= aaa > 


since o,, the variance of the distribution P(y) in (2.3), is known to be 2°/6. 

The left side of this equation is found to be 4.93479; the right side, 4.93480. 

This type of check cannot be considered to be very effective, however, as only 

gross errors, and not compensating ones, will ordinarily be revealed. 

The reader should be cautioned that, unless n is fairly small, it may be neces- 
sary to carry out the calculations to a considerably greater number of places 
than is desired in the results. This results from the presence of binomial co- 
efficients and alternating signs in formulas (2.4) and (2.5), both of which operate 
to reduce accuracy rapidly as n increases. 

REFERENCES 

[1] R. A. Fisher ano L. H.C. Tippett, “Limiting forms of the frequency distribution of the 
largest or smallest member of a sample,’’ Proc. Cambridge Philos. Soc., Vol. 24 
(1928), pp. 180-190. 

{2} A. FLtetcner, J. C. P. Micuer, anp L. RosenHEAD, An Index of Mathematical Tables, 
McGraw-Hill Book Company, Inc., 1946, pp. 343-344. 

[3] H. J. Gopwin, “Some low moments of order statistics,’ Ann. Math. Stat., Vol. 20 
(1949), pp. 279-285. 

[4] E. J. Gumpe., ‘‘Les valeurs extrémes des distributions statistiques,’’ Ann. Inst. H. 
Poincaré, Vol. 4 (1935), pp. 115-158. 

(5) E. J. Gumpex, “The return period of flood flows,’? Ann. Math. Stat., Vol. 12 (1941), 
pp. 163-190. 

[6] Table of the First Moment of Ranked Extremes, National Bureau of Standards Report 
1167, September 20, 1951, (special report submitted to the National Advisory 
Committee for Aeronautics.) 

(7) F. W. Newman, The Higher Trigonometry, Superrationals of Second Order, Macmillan 
and Bowes, Cambridge University Press, (1892), pp. 64-65. 





GENERALIZED HIT PROBABILITIES WITH A GAUSSIAN TARGET, I 


By D. A. 8. Fraser 
University of Toronto 


1. Summary. In a recent paper [2] the author developed a discrete distribution 
and several derived limiting distributions for the number of “hits” on a k- 
dimensional Gaussian target. The purpose of the present paper is to apply these 
results to the two-dimensional problem considered by Cunningham and Hynd 
[1]. A general expression and two limiting forms are obtained for the probability 
of at least one hit. The numerical evaluation using the data in [1] is considered 
for n = 5 rounds, and the probability of at least one hit is plotted in Fig. 1 for 
various combinations of aiming and dispersion error. For a given over-all time 
interval the evaluation for large n is discussed in Section 5 and illustrated using 
the data from [1]. 


2. Introduction. In 1946 Cunningham and Hynd considered a problem in 
aerial gunnery: to find the probability of hitting a moving target at least once. 
The various factors entering into the problem may be described as follows. 
The point at which the gun is aiming is found to wander back and forth across 
the target; its successive positions when n rounds are fired can be represented by 
a multivariate normal distribution with independence between the horizontal 
and vertical coordinates. We let the coordinates of this point of aim for the ith 
round be 2; , 22; (the point of aim is called a prediction in [2]). The dispersion 


error of the gun is also assumed to be normal; we let the trajectory coordinates 
be Yii,y Yu. 

In [1] the target was taken to be circular. Here we assume that it is Gaussian 
diffuse; that is, the probability of a hit is given up to a constant factor by a 
Gaussian p.d.f. of the coordinates of the trajectory. Because of the irregular 
outline of a plane and the sharp “drop off” of the p.d.f. proceeding out from the 
center, this is not an unreasonable assumption. 


3. The k-dimensional problem. The k-dimensional problem treated in [2] 
may be summarized as follows. A series of 7 predictions {X;;i = 1,---, n} 
is considered; a prediction X; is a random vector in k-dimensional Euclidean 
space R* and we let X; = {X,:; 4 = 1,--- , k}. The distribution of the n pre- 
dictions is assumed to be Gaussian with independence between the n values of 
any coordinate and the n values of any other. Letting {X,;;7 = 1,---, »} 
be Gaussian with mean {m,;; 7 = 1,---, n} and covariance matrix || o%? ||, 
then {X,;;7 = 1,---, mn} and {X,;;7 = 1,--- , } are assumed independent 
for u # v. A prediction X; = X; = (x1; , --- , 2;) becomes'a successful predic- 
tion with probability given by s,(Z;), the success function. In [2] s;(Z%;) has the 


Received 10/1/51, revised 11/20/52. 
288 





GENERALIZED HIT PROBABILITIES 


following Gaussian form: 
(3.1) ss) = Crexp — $20 15 tasty 
uP 


where 0 S C; S 1, || r%’ || is positive definite, and yu, » range over the set {1,---, 
k}. The general distribution obtained in [2] is the distribution of R, the number 
of successful predictions. 

In applying the results obtained in [2] to the Cunningham and Hynd problem, 
it is found that the success function is rot immediately available in terms of the 
prediction Z; rather, it is given in terms of a vector g which has a Gaussian dis- 
tribution about Z. The following lemma shows that, if the success function is 
Gaussian of form (3.1) in terms of 7, then it is also Gaussian in terms of 7. 

Lema. If (Y:,---, Y.) has a Gaussian distribution with mean (2, --- , Ze) 
and covariance matriz || G,, || = || G” ||’, and if the success function in terms of 
(yi,°*:, Ye) is B exp — 44> T”’ ysy, , then the success function in terms of 
(t1,---, tx) is also Gaussian diffuse and has the form C exp — 4% Do, 72,2, 
where 


Coma r|* 
= GG + 7" ae”, 


Proor. We calculate the probability of a “hit” as a function of the point 
of aim (a%1,°--, 2x). 


s(t) = E{Bexp -4 > T”Y,Y,} 
wr 


| qe \4 


, (n)FF | exp {—3 x GY, —Zy) (Yr — te) —4 a T”” yuyr} II dy, 


= B! ee exp [3D G"z,2,) 


[exp {-} x (G” + T”)ysy + x [X G'xu] yu} I] dy. 


|G” ir v IG vua’w’ #9 1)—1 ¥ vy 
= lal ane a’ rs { G" “ J 
B- Gor + _ Tw | \ exp — —41 G Tul, Dd [|G +T ii G Tye | 


wep’ e’ 


Bue + Do Gay TY” | 


-exp {-4 2 (@” — DI} G"" + 1" ||" @'e""Iz,2,}. 


uP ur’ 


This completes the proof. 


4. The Cunningham and Hynd problem. Cunningham and Hynd were in- 
terested in the probability P of at least one hit in a series of n rounds. Using 





290 D. A. S. FRASER 


(3.2) in [2] we have the following expression for P: 
(4.1) P=Kh,-— E,+---+(-1)""E,, 
where 


(4.2) E, = > Es, 
8, 


for which the summation is taken over all sets 8, of r integers chosen from the first 
n integers and Ez, is the probability that all the rounds designated by the ele- 
ments of 8, will be hits. Our problem is thus to calculate E, or Es, and the above 
formulas will give the desired probability P. Any other probabilities for the dis- 
tribution of R will be obtained from formula (3.3) in [2]. 

We now consider the two-dimensional problem [1] and introduce the following 
notation and formulation. 

The target is given by the probability of a hit on the zth round in terms of 
the trajectory coordinates (with the center of the target as origin). 

( 2 2 
Pri Hit| (iss, ved} = exp) —3 yas + yal 
T(t) ) 

where o(,) is a measure of the effective radius of the target. In the notation of 
the lemma in Section 3, we have 


B 


' my || 
| T || = _2 | 
Ot) || 
The dispersion error is given by the covariance matrix on the 7th round (in- 
dependence of coordinates being assumed as in [1)}). 
2 } 
our 0 
|g? Ae tee 
— 2(9) ||” 
| O aa” |i 


2(9) - . . . . 
where o,;” is the variance of the uth coordinate of the dispersion error. 


The success function is given, using the lemma in Section 3, by the following: 
go) 
C; = “Gro —Fo) » 
Ole O24 


j =-3 t+9) 
| wr | | o1s . 0 
ITH |] = || 
‘ | —2 +) ? 
| 0 o2. 
2(t+9) 2(1) 2¢9) o,e . 
where o,'‘ =o + o,'”, the addition of variances. 
The aiming error is given by the variance of that error for wth coordinate on 
. (u) 2(a) ° 
the ith round, o% = o,°°, and the correlation between the values of the uth 
P . -  (p) y ° . 
coordinate for rounds 7 and j, p;; . We are assuming that the mean is equal to 


zero; that is, there is no bias in the aiming. 





GENERALIZED HIT PROBABILITIES 


Using Theorem 3 in [2] we obtain the following expression for Eg,: 


Es, = Il C> I, | Opg + One Tp) S 


peBs 


2(t) 
(a) (a) (we) ~2(t+g) )-4 
Il =e (ito) to) ote) IT | Opg T Fup Fug Pog Fup 
pes, Tip 


2(t+9) |-3 


ay - (a) (a) II vs + By “"5 | 


2(a) 
PEBr Oj p Oey ¥m1L2 Cup 


Considering now the application of the Type I limiting distribution, we have 
the following result: 


2(t+9) 


- 
(#) Fu(ty) 
[ I -—- a) IL Pityta) T Spq 2,2) II dt, 
“0 P=1Oi(ty) F2(ty) # Fu (ty) . 
The duration of the burst is 7’, and Fit.) , for example, is the variance of the aim- 
ing error at time ¢,. From the conditions of Theorem 4 in [2], (4.4) will be a 
° . . ° ° 2(¢) - ' ° 
valid approximation for (4.2) with (4.3) if o “° is of order 1/n with respect to 
o ” and n is large. 
Similarly the Type II limiting distribution gives the following: 


1 _T a a 


me: 
5) Beste | Ue Tote | Uae 


“0 “0 p™=1 01 (6,) F2(8,) O"** siti 


2(t+g) 2(t+9) ° 
The limiting conditions are that o1{'*”, o2;'"” be of order 1/n, 


2(t) 
Co 2(a) 2(a) . 
gi; , and oe,’ of order 1, and n large. 


5. Evaluation for n = 5. The evaluation of the probability P requires all 
values of FE, er a sufficient number to estimate P from a truncated portion of 
the series (4.1). This direct evaluation can readily be carried out if n is small 
as in the example (nm = 5) in this section; we defer to Section 6 a consideration 
of the evaluation for large n. 

We use the data in [1] for a one-second burst of n = 5 rounds. As indicated 
in [1] we take the correlation p% to be the same for the horizontal and vertical 
coordinates and to be dependent only on the time interval between the rounds. 
Then from the experimental correlations tabulated in [1] we have p(0) = 1.00, 

(.25) = .80, p(.50) = .62, p(.75) = .48, p(1.00) = .33. From formulas (4.1), 
(4.2), and (4.3) we have that 


popgtaé Sx 
Qa 


Aig a? — p*(t2 — 1h) 


(5.1) 


\ 


4+... — (— s)’ bm pli, —_ mle —_ 5 po) + a6 ng Ps 


iceee<iy 


where 





D. A. S. FRASER 


2(t) 
Co 
= go)? 
2(t+9) 


oC 
en lt sy 


, 2 2g) _2%(a) . 
and the variances o ‘’, ¢ ‘”’, o are assumed to remain constant for the dura- 


tion of the burst. The function P was calculated for a series of values of a and 8 
and plotted in Fig. 1 with o” = 1. 








= 
= 
Vv 
e 
S 
we 
o 
oO 
2 
4s 
o 
Reed 
a 
° 
a 
a 


Fic. 1 


6. Evaluation for large n. The direct computation of P for a series of large 
values of n would be excessive. We now introduce a procedure for approxima- 
tion. Assuming that the correlation depends only on the time interval between 
the rounds, that the horizontal and vertical components have the same dis- 
tribution, and that variances are constant over the time interval, then P has 


the form (5.1). 
Consider for a moment the case in which correlation is absent (p;; = 4;;), 


and let 
F, = E, (p;; = 5;;) 


- (") B |’ 
~ Ard lal’ 
The expression for P also simplifies: 
P=as- + (5) a peas (-a)"(*) + 
a «/ a ry a 


This binomial expression has simple terms and it is natural to investigate the 
correction factor between terms in the correlated and uncorrelated series. We 





GENERALIZED HIT PROBABILITIES 


therefore define as follows: 


E, = Fl + ¢), cz &. cq = 0. 


We now derive an expression for c, from which it easily follows that c, is non- 
negative. 
1 


za | aBpq + Ppa(l — Sy—) |” 


a _ a 
geet n 
r 


K2 against n K; against n 


6———e——eeeeeeeeennenn nt ss 


1.3 penne ene 


Ld 
| 
| | 

i ic ian cil acca a era 














n Br » Ppe a a 
( ) bpq + —— (1 — Spe) 
r Qa 

From Theorem 4 in [2] we know that c, approaches a limit as n — «. This can 
also be seen directly since a > 1 (a = 1 would imply there was no target!). 
The limiting value is derived from Theorem 4, as 


-1 


, La r? | (tp —t = ad 
lim C, = | | bng + Pte — 0) (1 — 5,.)| at, --+ at; — 1. 
0 0 


ae T° a 


Since c, is stable for large n we now investigate the dependence on r. We note 
that lim, c, is the excess over 1 of the average value of the reciprocal of a deter- 





294 D. A. S. FRASER 


minant having l’s down the diagonal, and that these 1’s are larger by (a — 1)/a 
than elements which would be sufficient to make the matrix positive definite. 

Thus assuming a > 1| and expanding the determinant we find that the de- 
terminant is 1 — Dd r<elolt, — t,)/a}* +--+ , where p, g range over (i, --- , 7). 
Formally taking the reciprocal, we find that the inverse of the determinant is 
1 — do icy {p(tp — t,)/a}?--- . The excess over 1 of the average of such ex- 


: ; ee — r ; 
pressions will have a first term which is the sum of (5) squares of the form 


[o(t, i: tq)/al. 
This expression suggests replacing our correction factor c, by k, defined by 


> r nan ’ ’ r 1 2) ep : 
kp = QC; (5). Thus we have E, = F,{1 + (5) k,/a|. The correction con- 


stant k, has the following properties: 

(1) k, approaches a limit as n increases. 

(2) k, is to the first approximation independent of r and a for a@ large. 

Values of ks and k; were calculated for a series of n, using the time interval 
T = 1 as in Section 5 and the correlation function p(t) given in [1]. These are 
plotted in Fig. 2. It is to be noted that the approach to a limiting value as n 
increases seems very regular and kz and k; are quite similar except for the smaller 
values of a. This stability of the functions {k,} would facilitate the calculations 
if P were to be obtained for a series of large values of n. 


7. Acknowledgment. The author wishes to express his appreciation to Professor 
S.S. Wilks and Professor John W. Tukey for valuable discussions of the problem. 


REFERENCES 
{1] L. B. C. CunNiINGHAM anp W. R. B. Hynp, ‘‘Random processes in air warfare,’’ J. 
Roy. Stat. Soc., Suppl., Vol. 8 (1946), pp. 62-85. 
[2] D. A. S. Fraser, ‘‘Generalized hit probabilities with a Gaussian target,’’ Ann. Math. 
Stat., Vol. 22 (1951), pp. 248-255. 





NOTES 


A CLASS OF MINIMAX TESTS FOR ONE-SIDED COMPOSITE 
HYPOTHESES’ 


By S. G. Aten, Jr. 


Stanford University 


Summary. Fixed sample-size procedures are considered for testing a one-sided 
composite hypothesis concerning a real, one-dimen al parameter of an ex- 
ponential distribution (1.1). In particular, conditions studied such that the 
minimax tests have a critical region which is a semi-intiiite interval on the real 
line. 


1. Statement of the problem. Let Y be a real-valued, one-dimensional ran- 
dom variable with the probability density 


(1.1) p(x, 0) = w(Oy(xre™, 


where 


(1.2) w(@) = Lf. ey(x) ax | 


is a positive, bounded, continuous function of the real variable 6 and where (x) 
is a continuous, nonnegative function of the real variable x. Let X, , X2,---,X 
denote n independent observations on X, and let T(X,, --- , X,) denote a 
fixed sample-size procedure based on the n observations for testing the com- 
posite hypothesis @ > 6 against the alternative @ < 6 . The loss functions are 
detined as follows: if the hypothesis is rejected, the loss is w,(@) 2 0 for 6 > 6 
and w,(@) = 0 otherwise; if the hypothesis is accepted, the loss is w.(#) 2 0 
for 6 < 4% and w.(@) = O otherwise. Furthermore, it is assumed that the func- 
tion w)(@) is actually positive for at least one value of 6 > 6 , and w.(@) is posi- 
tive for at least one value of 6 < 6). The problem to be considered is the selection 
of a minimax test procedure T(X,, --- , X,) under these conditions. 


n 


2. A class of minimax tests. For testing the simple hypothesis 6 = 42 in (1.1) 


Received 11/5/51, revised 11/2/52. 

‘This paper, representing work done under the sponsorship of the Office of Naval Re- 
search, was presented at the Western meetings of the Institute of Mathematical Statistics, 
June 15-16, 1951, Santa Monica, California. Discussion with members of the Department of 
Statistics, Stanford University, in particular with Professor M. A. Girshick, was most 
beneficial in the formulation of the present draft of the paper. The author understands that 
results similar to those of the present paper were obtained for the sequential case by Milton 
Sobel in his doctoral thesis, ‘‘An essentially complete class of decision functions for certain 
standard sequential problems.” 


295 





296 S. G. ALLEN, JR. 


against the simple alternative 6 = 6; with 6; < 6, the minimax procedure based 
on n independent observations on X is well known [1]. The value of the statistic 


n 


9 7 i p(x; , 6») 
(2.1) h = (0, , 2) I] as 


is computed from the observed values of XY in the sample. The hypothesis is 
then accepted if \ > ¢ and rejected if \ < c, where the criterion ¢ satisfies 


(2.2) w(@)Pr(rX Sc | Ae) = w(A)Pr(rA > c | 4). 
This value of ¢ is 


wi(82)(1 — g)’ 


where g is the least favorable a priori probability that @ = 4, . 

From the form of the density function (1.1), it is clear that an identical 
procedure to the preceding ratio test specifies acceptance of the hypothesis if 
and only if }>".2; > k, where 


(2.4) k= sili log c | ‘ 
B _ A; w(B2) | 


Since the probability density of the statistic > iti is again of the form (1.1) 
(see Section 4 of [2]), the discussion of tests like the above is not restricted by 
an assumption that the sample consists of a single observation on X. There- 
fore, the number k defined in (2.4) may be determined by a condition equivalent 
to (2.2) with n = 1, namely, 


(2.5) w,(@:)Pr(X Sk | A.) = we(A)Pr(X > k | 6). 


Let 7,.(X) denote a test procedure specifying acceptance of the hypothesis 
6 > 6, if the observed value of X exceeds k, and specifying rejection otherwise. 
One might ask if such test procedures, which form a class of minimax procedures 
in the case of the simple dichotomy, retain this property in the more general 
problem of Section 1. If so, does a condition similar to (2.5) determine the 
minimax test? 

The following theorem supplies an answer.’ 

TuHeorem 1. Let 


k 
(2.6) R,(k, 6) = n(@) | w(O)y(x)e* dx, 


Rs(k, 6) = we(6) | w(Ow(a)e™ dx, 
k 


2 The motivating idea for Theorem 1 was a lot acceptance sampling procedure proposed 
in an unpublished paper by Mr. Norman Rudy of Sacramento State College. 





MINIMAX TESTS 
Then T,(X) is minimax if 


(2.8) max F,(k, @) = max R,(k, 6). 
06> 66 0<60 
Proor. Let R(T’, G) denote the expected loss of a test 7 with respect to the 
a priori distribution with cdf G(@). In particular, for a ko satisfying (2.8), 


oc 60 
R(T,,G) = | Ri(ko , 6) dG(6) + [ Ra(Io , 6) dG(@) 


oo 60 
< I max (Ito , 6) dG(6) + [ max Ro(Ip , 6) dG(6) 
6 wo 8< 65 


0 6260 
= max R,(kp , 0) = max R,(ko, 6). 
@>8 #< 8 


Let 6; , 6 be values of 6 such that 6; S 6 S 6 and 


2.10) max R,(ko ’ 6) = Ri (ko , 62), 


6>60 


(2.11) max Ro(ko , 0) = Re(ko , ). 
6566 
If G is a distribution concentrating all probability at 6; and 6 , then the equality 
sign holds throughout (2.9). Therefore 
(2.12) max R(7;,,,G) = max R,(ko, 0) = max R.(ko , 8). 
G 6>6o 6<65 

In particular let Go be the distribution given by g = Pr(@ = 4), 1 — g = 

Pr(@ = 6), where g satisfies 
w.(0;)gw(4;) 


w1(82)(1 — g)oo(B2) ” 
Clearly 7,, is the Bayes procedure against Gp. (Compare g in (2.3) and (2.4).) 
Hence 


min R(T, Go) = R(T;, , Go) = max R(T;,, , G). 
T G 
Application of the saddle-point. theorem of [3] completes the proof. 


3. An example based on the normal distribution. Suppose it is desired to test 
the hypothesis that 6, the mean of a normal distribution with variance one, is 
positive against the alternative that it is negative, where w,(@) = 6 for 6 2 0 
and w(@) = —@ for 6 < 0. The functions defined in (2.6) and (2.7) are 


k—@ ‘ 
Rk, 0) = [ e™ dy, 


V2 


- 


Rilke) = | — ea r™ Je dy, 0 
ae /2n° > a lx, ° Y, 





298 S. G. ALLEN, JR. 


Since Ri(k, —| 6,) = Ri(—k, | |), it follows that maxe<o R2(0, 6) = maxe>o 
R,(0, 6), provided the latter exist. This is certainly the case, since, by L ‘Hospital’s 
rule, 


6 2 
lim R,(0, 0) = lim =e” = 0. 
6 +00 6 +00 2r 
4. Remarks on the discrete case. The continuous distributions studied in the 
preceding sections represent a sub-family of a more general family of distribu- 
tions of the form w(@)e"d¥(x), where ¥ is a measure on the real numbers and 


where 
w(0) = i e” ave) | 


is a positive bounded function of the real variable 6. This family includes many 
of the most important distributions encountered in statistics, such as the normal, 
x, binomial, negative binomial, and Poisson distributions. 

Suppose the distribution under consideration in this family is a discrete 
one, and suppose that W(2) assumes jumps at each value of a denumerable, 
ordered sequence (2, 22, °::). For example, if XY is the number * successes 
in 2 Bernoulli trials, the function V(x) assumes jumps at x = 0, 1, 2, ---, 7. 
In general, it will not be possible to find a value of k in i a sequence so that 
condition (2.8) is fulfilled. However, a randomized mixture of two procedures 
T,, and T,, will be a minimax procedure if there exists a pair (k, k’) such that 


max R,(k’, 6) < max R,(k’, 8), 


@>8o 6< 6 


max R,(k, @) > max x Rs (k, 0), 
62> 60 6< 
where k’ is the next smaller element than k in the sequence (a, 22, ---). In 
this event, the minimax test procedure consists of the following: reject the 
hypothesis @ > 4 if x < k; accept the hypothesis if x > k; if x = k, accept the 
hypothesis with probability f and reject with probability 1 — f, where f satisfies 
max [fR,(k’, 6) + (1 — $)Ri(k, 6] = mon [fR(k’, 0) + (1 — f)R2(k, 6)). 


62> 60 
REFERENCES 


(1] A. Wap, Statistical Decision Functions, John Wiley and Sons, 1950, p. 17. 

{2} M. A. Grrsuick anv L. J. SavaGeE, ‘Bayes and Minimax Estimates for Quadratic Loss 
Functions,’’ Proceedings of the Second Berkeley Symposium on Mathematical 
Statistics and Probability, University of California Press, 1951, pp. 53-73. 

[3] J. voN NEUMANN AND O. MORGENSTERN, Theory of Games and Economic Behavior, 2nd 
ed., Princeton University Press, 1947, p. 95. 





ORTHOGONAL PREDICTORS 


NOTE ON COMPUTATION OF ORTHOGONAL PREDICTORS 
By Braprorp F. KIMBALL 
New York State Department of Public Service 


Summary. The present note calls attention to a simple algorithm for computa- 
tion of the orthogonal matrix associated with the matrix of the normal equations 
of least squares. An application of the “forward” solution associated with the 
orthogonalization process is also pointed out. 


1. Introduction. This note is supplementary to Dwyer’s recent book Linear 
Computations, [1]. The author also acknowledges reference to extensive unpub- 
lished notes on matrix analysis of numerical methods kindly furnished him by 
A. 8. Householder. 

Orthogonalization of the predicting variables of the least square problem has 
been considered since the introduction of orthogonal polynomials into the least- 
squares problem by Tchebycheff 1853-73 and the significant doctoral dis- 
sertation of the Danish mathematician and actuary, J. P. Gram, in 1879, [3]. 
A discussion of the problem is also to be found in Poincaré’s Calcul des Proba- 
bilités, 1912, Chapter XV. Various methods of simplifying the computations for 
purposes of mass application to statistical data have been studied since then. A 
complete bibliography is beyond the scope of this note. 

The advantages of obtaining the solution of the least squares problem in terms 
of orthogonal predictors are numerous. Perhaps the most obvious are those asso- 
ciated with the resulting simplified expressions for the error formulae, correla- 
tions [2] and sampling variance of the fitted function [5]. Also the orthogonaliza- 
tion of the predicting variables is a starting point for the computation of princi- 
pal components significant to structural relationships in psychology and eco- 
nomics. A further application of the associated “forward” solution is pointed 
out in connection with the lemma stated in Section 4 of the present paper. 

Two solutions are obtained. One is in terms of a slight extension of the algo- 
rithm of the square root method of solving the normal equations [2], and the 
other is in terms of the algorithm of the Gauss-Doolittle method. The first 
method has the advantage that the coefficients of the orthogonal predictors have 
unit sampling variance weight. The second method is based on the more familiar 
Doolittle algorithm and does not involve square roots. 

There seems to be some need for standardization of notation in matrix analy- 
sis. Since Householder has presented a consistent application of matrix analysis 
to a large body of material, we shall keep to his notation in the following re- 
spects. Capitals will be used for matrices and the transpose of a matrix will be 
indicated by the superscript T rather than by a prime. Single row or column 
matrices and vectors will be denoted by small letters. 


Received 3/17/52. 





300 BRADFORD F. KIMBALL 


2. Derivation of orthogonal matrix from matrix theory. Let 
(2.1) X = || x15, %eo3, °°* 5 Enz II, j 2,::-N,n SN 


denote the N rowed matrix of the V observations on the n predictor variables 
x,;. With the subscript j suppressed, the observations on the » original predictor 
variables are treated as column vectors z;, 7 = 1 to n. 

Consider a matrix V of n rows and columns. The product matrix XV" will 
consist of n columns, each of which is a linear transform of the n columns of 
the data matrix XY. Hence, if the n linear functions defined in the n columns are 
to be orthonorme|, the following matrix relation must obtain 


(2.2) (XV"7)"(XV’) = 7, 


where / denotes the identity matrix. 
From general matrix theory (XV")" = VX’, and (2.2) can be written [4] 


(2.3) V(X'X)V’ = 1. 


For purposes of setting up an algorithm for computation of V we introduce 
the triangular matrices of the Gauss-Doolittle and square root methods. Using 
Dwyer’s notation, but replacing small letters by capitals where matrices are 
referred to, the upper triangular matrix of the summation rows of the Doolittle 
process is denoted by G’, and the corresponding matrix (after division of the 
kth row by the square root of gx.) for the square root method is denoted by S 


(see [1] p. 188). Let D denote the diagonal matrix composed of the diagonal 
elements of G. Then 


(2.4) S = DG’. 


Denoting the matrix of the moments of the predictor variables by A, the fact 
that S factors A is expressed as 


(2.5) A = X°X = S’S. 
Substituting in (2.3) we find V(S’S)V" = I which can be written 
(2.6) (SV")"(SV’) = 1. 

A meaningful solution of this equation is given by 

SV’ =], Vv’ = §", V= 

and using (2.4) the solution can also be written as 
(2.8) Vv" = G’)"'D’. 

3. Computational algorithm for orthogonal multipliers. Dwyer has pointed 
out that the inverse of the matrix S can be very directly computed by using the 
identity matrix for the reference matrix of the dependent variable on the right 


in a computational schedule similar to the familiar Doolittle algorithm ({1] pp. 
191 and 197, explicit in Table 13.6b and implicit in Table 13.8a). 





ORTHOGONAL PREDICTORS 301 


For reference purposes we reproduce the schedule for n = 3 in terms of 
symbols, omitting the secondary subscripts: 


Qi Aye Ay 


A = * de As 
7 * 


Su S12 813 
s=0 820-823 


0 0 §33 


Because of the symmetry of the 4 matrix the computational algorithm 
assures that SV” = J. Denote the n columns of the product matrix XV" by the 
column vectors 


(3.1) gi = UuN1, gb: = Vyti + Vode, +++ On = p Unit; « 


These vectors constitute a set of n mutually orthogenal vectors which are linear 
transforms of the original set. They take the classical form, which one would 
expect, and are a ‘“‘normal” set, since didi = 1. 

For the uninitiated who are not familiar with the square-root method, or who 
prefer to use the Gauss-Doolittle method in the solution of the problem of least 
squares, recall from (2.8) that V = D'G™ and note that G™ is given by the 
triangular matrix represented by the lower of the doublet of rows extending 
under the identity reference matrix of the Doolittle algorithm ({1] p. 191, Table 
13.6a). It follows that the triangular matrix from the upper of the doublet of 
rows, which we denote by R, satisfies the relation 


(3.2) R = D'V. 
Thus a set of mutually orthogonal vectors y, is determined by the columns of 
the matrix product XR". These, although simpler to compute than the ¢ , 


have the disadvantage that ¥ivx = gi. One can of course compute ¢ from 
y by simply dividing through by Vgux . 


4. Accessory relations. Let y denote a column vector of N observations on the 
variable to be fitted in a least squares problem and let ¢ denote the column 
vector of the coefficients of the orthogonal predictors ¢, . Recalling that @ = XV” 
represents an n column, NV row matrix, conventional solution of the normal 
equations by matrix analysis leads to [4] 


(4.1) t=%y tt = oy. 
Furthermore it is easily seen from matrix analysis that 
@'’X = VX’X = VS’S=I1S=S 
and hence 


(4.2) Ss biz; ’ 





302 BRADFORD F. KIMBALL 
Since the algorithm for finding sj» in the y column is operationally the same as 
that used for finding sx in the 2, column, it follows that 
(4.3) So = Ory = bk 
and hence the fitted function u is given explicitly by 
“= 2. td. = Zz Sox 5 
Oe = Vert, + Vere + es y 1 Mente, k=1,2,---n. 


Clearly if computations are based on the Doolittle algorithm and the predictors 
y, are used, similar relations will hold: 


(4.4) 


(4.5) ei = Wit; ‘ t # Q; gnu = viY, 
and the coefficients of the predicting variables y, will be given by 
(4.6) bio = Gi0/ Jue - 


A forward solution is furnished by solving the above relations for the co- 
efficients of x; in the fitted function. Denoting these coefficients by the column 
vector h, we have u = #¢ = Xh. Since X is not in general a square matrix, this 
equation is solved for h by multiplying it by what has been called the ‘‘pseudo- 
inverse” of X; namely, (X’X)"'X’. The result is 


h = (X7X)'(X"6t) = (X7X)"'(X7X)V"t = Vt. 


In explicit, nonmatrix form 


hy = ton + bon + °° + tava 


bn 1Un—1,n-1 + taVnn—1 


tian e 


A useful point about this forward solution which may easily be passed over 
without recognition is the following. 

Lemma. If 4, t «+: & are determined as o1y, o2y, -*: , ony, then 
hi, ho, «++ , hy derived from the above schedule (4.7) will satisfy the first k normal 
equations of the n predictor problem for arbitrary values of ti41, tesa, +7 + 5 tn. 

The proof follows by applying the first k conditions for minimizing the sum 
of the squares of the deviations to the orthogonal form of the solution (4.4). 

The writer has found application of this lemma to the following problem. 
Several linear functions are to be fitted to separate groups of data by least 
squares, where, say, one of the coefficients is to be determined so that it is 
optimum for all the groups lumped together, and the other coefficients are to be 
determined separately for each separate group. (One such application will be 
discussed by the author in an article which will appear shortly in the Journal of 
the American Statistical Association.) 





RANK ORDER STATISTICS 


REFERENCES 


[1] P. S. Dwyer, Linear Computations, John Wiley and Sons, 1951. 

(2) P. S. Dwyer, “The square root method and its use in correlation and regression,” 
J. Amer. Stat. Assn., Vol. 40 (1945), pp. 493-503. 

[3] J. P. Gram, ‘‘Ueber die entwicklung reeler functionen in reihen,’’ Crelle’s J. fiir Math., 


Vol. 94 (1883), pp. 41-73. (This is a presentation in German of his 1879 disserta- 
tion.) 


[4] A. S. HouseHoiper, ‘Some numerical methods for solving systems of linear equa- 
tions,”’ Amer. Math. Monthly, Vol. 57 (1950), pp. 453-459. 

[5] B. F. Kimpatt, ‘Orthogonal polynomials applied to least square fitting of weighted 
observations,” Ann. Math. Stat., Vol. 11 (1940), p. 352. 


A ee es aR ee nn 


ON THE ASYMPTOTIC NORMALITY OF CERTAIN RANK ORDER 
STATISTICS! 


By Meyer Dwass 


Northwestern University 


1. Summary. Let (Ri , --- , Ry) be a random vector which takes on each of 
the N! permutations of the numbers (1, --- , NV) with equal probability, 1/N!. 
Sufficient conditions are given for the asymptotic normality of Sy = Ze vidwe;; 
where (avi, --- , Gww), (bwi, *-+ , Dww) are two sets of real numbers given for 
every N. These sufficient conditions are apparently quite different from those 
given by Wald and Wolfowitz [9] and extended by various writers [4, 7]. In some 
situations the conditions given here may be easier to apply than those given 
previously. The most general conditions available to date appear to be those of 
Hoeffding [4]. In the examples below, however, is given a case of an Sy which 
does not satisfy the conditions required by Hoeffding’s theorem but which is 
asymptotically normal by our results. 

2. Statement of theorem and its proof. We will assume hereafter that 


N N 
~ ani = 2X by; = 0, 


i=l 


N 


THEOREM. Suppose for an integer k 2 1 there is a random variable X satis- 
fying the following conditions: 

(a) X has a continuous edf F(z), 

(b) if X,,---, Nw are independent random variables each with the cdf 
F(x) and Zy; S --- S Zwyy are the ordered values of X,,--- , Xw then 


N 
yrgk yk 
by; = EZi; — D> EZk;/N 


j=l 


for all N and i. 


Received 7/21/52, revised 11/24/52. 
1 Work done under the sponsorship of the Office of Naval Research while the author was 
a graduate student at the University of North Carolina. 





304 MEYER DWASS 


(ec) E|X|* < o. 

(d) Either X% is normal or (e) maxigicw | av; | 0 as N > &. 
Then Sy is asymptotically normally distributed. 

Proor oF THEOREM. Associate with the random vector X,,---, Xw the 
random vector R,, --- , Ry where R; = number of X; S X;. 

Let gw(X) = gx(X., --- , Xw) be the random variable gx(X) = paw : 
Hence, for every N, the distribution of gy(X) is identical with that of Sy , for 
each assumes the same set of values with the same probabilities. Write 


gv(X) = D ayi:Xt — e2 ay;Xi — s(X)). 
=1 =1 
If it can be shown that 
(1) ; ® On; X% _ gn(X) 
i=1 


converges in probability to zero, then if is ay,X* has a limiting distribution, 
gv(X) will approach that same limiting distribution (as V — ~) ({1], p. 254). 

That >> %, ay,X* has a limiting normal (0, 1) distribution is seen by applying 
the condition of Liapounoff that 


N } 
'z | ay; |°E | X* — EX* ’) 
wl 

(E(X* — EX*)*)t ° 


as N — o. This is so, since 





N N 
> ay; |" < max | ay; | 7 (ay;) = max | ay; 
t=1 l<sitsN j=l lsisNn 
To show that (1) converges in probability to zero, it will be sufficient to show 
. ’ y rk r\\2 y : 
that limy.. E(>>*-1 ay.X* — gx(X))* = 0. Denote by Uy the expression 


N 


Uv =E (> ay;Xi — ws(¥)) = E b> ay (Xi — EX’) - on(&)) 


i=l i=1 


= E(x* — Ex")? — = ) Hy I(f N! >, av Xt — EX') II aF(c,)) 


t=! 


> arden + Egx(X) 


where the integral is over that part of the space where R; = r; (i = 1, --- , NV) 
and r;,---, rw is one of the V! permutations of 1, --- , V and where the sum- 
mation Zz is over all such permutations. 

By condition (b) and by the fact that V ye EZ; = EX", it follows that 
Uy = E(X* — EX‘) — Egx(X). By straightforward algebra, 


iN 2 N 
Eg, (X) = a yy (a aw; bere) apt ae, 


N-1%1 





RANK ORDER STATISTICS 


y=T as - N(N 3 
N a 
— oe 3 (EZ*,;)" ay (EX). 


By a theorem of Hoeffding [3] 


(2) lim — Ls (EZy,)? = EX” 
ne N ¢ 
Hence limy.. Uy = 0, which proves the theorem. 

3. Applications. 

EXAMPLE 1. Consider the test studied by Hotelling and Pabst [5| based on the 
statistic Sy = >0%; iR;. This statistic was shown to be asymptotically normal 
in [5]. If we set ay; = (i — (N + 1)/2)/N(N* — 1)/12 and by; = i/(N + 1) — 1/2, 
then it is easy to see that the random variable X which has uniform distribution 
on the unit interval satisfies the conditions of the theorem with k = 1. Hence 
Sy is oneeesy normal and therefore so is Sy . 

EXAMPLE 2. The statistic Sy = >°*., ay:HZwye;, Where the Zy; are order sta- 
tistics from a normal (0, 1) population and the ay; satisfy certain conditions, 
has been studied by Hosffdine and others [8] and shown to be asymptotically 
normal. Our theorem shows Sy to be asymptotically normal not only for the case 
of normal order statistics but also when the Zy; are order statistics from any 
population satisfying conditions (a), (ec) and (e). The last will be satisfied, for 
instance, when 


; iV/n/(mN) (i 1, -+-,m) 
(3) ani = 


\— Vm/(mN) ((=m+1,---,m+n), 


where m + n = N and m and n both approach infinity as N approaches in- 
finity. This type of ay; is commonly used in the “two-sample problem.” 

EXAMPLE 3. When ay {> %=1 (EZyi — Doda EZyi/N)*|) = EZyi — Doe EZyi/N 
and by; = EZy; — > 3 it EZy;/N, this Sy has been studied by Hoeffding [2] for 
the case of Zy,; from a normal (0, 1) population. In this case he showed Sy to 
be asymptotically normal. Our theorem shows this is also true when the Z»; are 
order statistics from any population satisfying (a) and (ce), (kK = 1), since (e) 
holds. This is so since maxi<.<w | dy; | is given for either the index 1 or N. As- 


sume it is V. We have EZin = y[ 2’F*(x)dF (x), (j = 1, 2), and an easy 
eo 


argument gives that limy_. EZiy/N = 0. This and the fact that (EZyy)’S 
EZyw~ together with (2) proves the assertion. If the index is 1, the proof 
is analogous. 

EXAMPLE 4. When the ay; are given by (3) and by; = i/(N + 1) -- 3} the 
statistic Sy is, for every N, linearly related to the Wilcoxon statistic, further 
discussed by Mann and Whitney [6], which, as is well known, is asymptotically 
normal. This is also seen from our theorem for reasons stated in Examples 1 and 2. 





306 MEYER DWASS 


Exampte 5. In a thesis by Terry [8], the statistic m — }°"-, EZive, (where the 
Zy; are the order statistics from a normal (0, 1) population) is proposed against 
the alternative that the X,; are normal with common mean, the first m having 
one variance, the remaining M@ — n another. This statistic is linearly related to 
an Sy where the ay, are given by (3) and by; = EZy; — )o%2; EZ%,;/N. No con- 
sideration of the asymptotic distribution of this statistic is made in [8]. We see 
that this Sy is asymptotically normal when the Zy; are order statistics from 
any population satisfying (a) and (c). 

By way of example of a case not covered by earlier theorems (for instance, 
see Theorem 4 of [4]) we take Sy = zor QviE Zn; where the Zy; are order 
statistics from a normal (0, 1) population and where condition (13) of [4] is not 
satisfied. We can construct such a case as follows. Let the ay; be given by (3) 
but let the integer m be fixed and independent of NV. Then condition (13) of 
[4] says that 


N 
“2 iy EZR/N 
(4) nf ™.) +m (=*) inion 
\nN mN Paste’ ue 

>> EZ3,;/N 

i=1 
must approach zero as N approaches infinity for r = 3, 4, --- . From [3] we 
have that >.*-; EZi,/N has for its limit the jth moment of a normal (0, 1) vari- 
able. Hence for even r, (4) does not approach zero. However, we see from our 
theorem that Sy is asymptotically normal. 


REFERENCES 


[1] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[2] W. Hogerrp1na, ‘‘Most powerful rank order tests of independence,’’ unpublished paper, 
1950. 

[3] W. Hoerrpina, ‘‘On the distribution of the expected values of order statistics,’’ Ann. 
Math. Stat., Vol. 24 (1953), pp. 93-100. 

[4] W. Hoerrpine, “A combinatorial central limit theorem,’’ Ann. Math. Stat., Vol. 22 
(1951), pp. 558-566. 

[5] H. Hore.utne ano M. R. Passt, ‘‘Rank correlation and tests involving no assumption 
of normality,’’ Ann. Math. Stat., Vol. 7 (1936), pp. 29-43. 

[6] H. B. MANN ano R. R. Wuttney, ‘‘On a test of whether one of two random variables is 
stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 
50-60. 

[7] G. E. Noeruer, ‘On a theorem by Wald and Wolfowitz,’’ Ann. Math. Stat., Vol. 20 
(1949), pp. 455-458. 

[8] M. Terry, ‘Some rank order tests which are most powerful against specific parametric 
alternatives,’’ Univeristy of North Carolina Library Unpublished dissertation, 
1951. 


[9] A. Wap anp J. Wotrow1tTz, ‘‘Statistical tests based on permutations of the observa- 
tions,’”’ Ann. Math. Stat., Vol. 15 (1944), pp. 358-372. 





VARIATION OF MEANS 


NOTE ON THE VARIATION OF MEANS 
By Casper GOFFMAN 
Wayne University 


In a manufactured product, batch to batch variations may appear, and it 
may be of interest to be able to compare these variations for different runs. 
The simplest case is that for which there is normal distribution with the same 
standard deviation for each batch, but where the mean may vary from batch to 
batch. The question arises regarding what function of the set of means should 
be taken as a measure of its variation. Thus, if 2; , 72, --- , 2, are independent 
random variables, all with the same standard deviation, say ¢ = 1, and means 
M1, M2, *** , Mn, the question is what function f(u, we, -*- , un) should be 
taken to measure the variation of the means. We find, in this note, that if 
f(ur, ue, *** 5 Bn) is subjected to four conditions, three of which seem quite 
natural and the fourth of which, although perhaps not so natural, has a certain 
appeal, then f(u:, we, °-: , wn) = F(V), where V is the sum of squares 
> 1 (ui — 2), 2 = Doe w,/n. The properties we have in mind are: 

(i) f(u:, we, *** , Mn) iS Continuous, nonnegative, and is equal to zero if and 
only if wu: = we = --* = pn. 

(ii) For every « > 0, there isa dé > 0, such that whenever f(u1 , we, *-* , un) < 4 


then | wi — uw; | < ¢ for every 7,7 = 1,2, +--+ ,n. 

(iii) For every wi, wz, --* , we and every h, flu, + h, --- ,un +h) = 
flu “teary Mn). 

(iv) Ifay,a2,-+*,2n 321,22, °°*,@_ are normally distributed with standard 
deviation o = 1 and means m1, --- , Mnj Mi, °°" > Bas respectively, and if 
flu, -** 5 un) = f(ur, --* , wa), then the random variables u = f(x, --+ , 2n) 
and v = f(xy, --- , tn) have the same distribution function. 

Condition (iv) says that the distribution of the estimate of the variation of 
means, obtained from samples, depends only upon the measures of the variation 
of the means, (assuming standard deviation 1) and upon no other aspect of the 
set of means. 

In this connection, we note that the distribution of the sum of squares of n 
independent variables with means a , d2, --- , @, depends only on the variance 
of the means, as does the power function [1] of the analysis of variance test. 

THreoremM. If f(ui, «++ , un) has properties (i)—(iv), there is a continuous F (x) 
such that F(V) = f(u., --* , pwn), where V = 2 (@ — uw)’, a = 
eae Mi Mt. 

Proor. Let 21, °°: , nj @1, °°: , ta be normal, with means mw, --* , un; 
ui, °°: , #, and standard deviation ¢ = 1. Suppose 


(1) 2 (u; — a)” ¥ a (ui — @’)’. 


Received 7/10/52. 





308 CASPER GOFFMAN 


By property (iii), we may suppose that )o2iy; = > hia: = 0. Then (1) 
becomes 


(2) » ni # » Mie 

Let a = > wl’, B = [Doh u:|' and suppose 8 > a. Let € = 
(8 — a)/(2n + 1). By property (ii), there is a 5 > O such that if f(a, --- , tn) < 
6 then | x; — x; | < Vne for all i, j = 1, 2, --- , n. Now, let E be the set of 
points, (a1, --- , Xn), for which f(z, --- , x.) < 6, and let Ey) C E be those 
points of E for which a x; = 0. Then 


P(u < 8) (40)" | tee feo > (ui — 2)? dr, +++ , dt 
? gE i=1 
(3) 
(3x) | Eo | / g erate dz, 


— GS 
7 9 A © (ufwt)? 
Piv < 8) (ax)"? | ni [- MEM" day, +++, drs 
E 


(4) : 
oe }{(8—ne) 24-28 
= (})" Fo| [ eae 
ao 


where | Ey | is the n — 1 dimensional measure of Ep . 

That expressions (3) and (4) hold may be shown as follows: £ is the cylinder 
whose axis is the line 7; = 2 = --- = 2,. The n-tuple integrals may be 
evaluated by letting x be measured along this axis and by integrating, for each 
x, over the hyperplane normal to the axis at zx, and then by integrating with 


respect to x. It follows that P(u < 6) = (4n)"" | E | | ¢(x) dx and 


oo n 
Pv < 6) Ss (3n)""| EB | / v(x) dx, where g(x) < ¢%s2i1%***" and 
— 


v(x) = et 21% "0? for all (v1, --* , an) and (x1, --- , aa) in E whose pro- 
jections on the axis 4) = % = --+ = a2, fall at x. Now, for every such 
i. Hore, * oe (u; — x)” S x° + (a + ne)’, since the vector whose compo- 
nents are uw; — 24;,7 = 1, --- , n, has x as one orthogonal component, the other 
of which is not greater than the sum of the distances of (u1, --- , wn) and 
(a,, -** , %,) from the line x} = x2 = --- = 2, ; but this is readily seen not to 
to exceed a + ne. Similarly, os (u; — xi)’ = x + (8B — ne)’. Accordingly, 
we may take g(x) = ¢2!*"**79) and y(z) = e 18-nd?4+24 | Moreover, 

(5) | Eo | > 0, 

for, since f(z: , --- , x,) is continuous, there is a sphere S of radius less than e, 
containing (0,0, --- ,0), for every point (1, --- ,2,) of which f(m, --+ , an) < 6. 
The set So C S, of points (a, --- , an) ¢ S for which >= x; = 0, is a subset 
of Eo of positive n — 1 dimensional measure. But « = (8 — a)/(2n + 1) implies 





MILL’S RATIO 309 


8 —ne> io + ne. It then follows by (3), (4), and (5) that if wi, +--+ , un; 
M1, -** , Me Satisfies (1), then P(u < 6) > P(v < 4). Hence, by property 4, 
Slur, +++ 5 on) % Slur, «++ , Bn). 

On the other hand, let ui, --- , un; #1, °** , #n be such that 


(6) = D@-w)’ = Dui- a 


Suppose a = f(u1, --- , un), 6 = fut, --* , wa), and that a ¥ b. Let C; and C; 

be continuous curves joining (u:, --- , un) to (ui, --+ , wa) such that for every 

(ui, --- , we’) € Cy, not an end- -point, >= (us? — pg)? < V, and for every 

(ur, --- , we’) © Cz, not an end- -point, Ze (uy? — a”)? > V. Since 

, bn) is continuous, there are points (u{’, --- , wa’) © Cy and 
- , ws”) © Ce for which 


Slur’, +++ ue) = Slur’, «+, we) = 4a + DB). 


But (7) contradicts the fact, already established, that >= (ui — a) # 
doe (us — a’)? implies f(us, --- , un) ¥ Slur, «++ , un). We have now proved 
that flu, -*: , wn) = Slur, -*: , wa) if and only if yo ™ (ui — a) = 
> 1 (ui — a’)’. But this is simply another way of saying that there is an F(x) 
such that F(V) = f(ur, --- , un). 

Conversely, it is easy to prove that: 

If F(x) is continuous, monotonically increasing, and F(O) = O, then 
Slur, *** 5 un) = F(V) has properties (i)-(iv). 

REFERENCE 
{1} P. C. Tana, ‘“‘The power function of the analysis of variance test with tables and il- 
lustrations of their use,’’ Stat. Res. Memoirs, Vol. 2 (1938), pp. 126-157. 


Sean EERE Diesen 


ON MILL’S RATIO FOR THE TYPE III POPULATION 
By Des Ras! 


University of Lucknow 


1. Introduction and summary. Mills [1], Gordon [2], Birnbaum [3], and the 
author [4] have studied the ratio of the area of the standardized normal curve 
from z to ~ and the ordinate at x. The object of this note is to establish the 
monotonic character of, and to obtain lower and upper bounds for, the ratio of 
the ordinate of the standardized Type III curve at z and the area of the curve 
from x to «. This ratio, as shown by Cohen [5] and the author [6], has to be 
calculated for several values of x when solving approximately the equations 
involved in the problem of estimating the parameters of Type III populations 
from truncated samples. It was found by the author that, for large values of 


Recdeud 2/28/52, revised 12/31/52. 
1 Now at the Indian Statistical Institute. 





310 DES RAJ 


x, when the ordinates and areas are small, either this ratio cannot be obtained 
from existing tables prepared by Salvosa [7] or that very few significant digits 
are available for its calculation. It was thus found desirable to obtain lower 
and upper bounds which could satisfactorily locate this ratio. The monotonic 
behavior of this ratio and the inequalities obtained may also prove useful in 
checking the accuracy of the tables in [7], and in studying the nature of the 
tail of the chi square distribution. 


2. Derivations. The standardized Type III population is given by 
(1) Cf(x) dx, —? a3 < x < 2, 0 s a3 < 2, 


where 


la?) 
(2) fla) = {1 Bah vin, 
and 
C = (4/a3) “Pe *3['(4/a3)] 
We define 
(3) u(x) = f(x)[G(x)J7, 
where 


(4) Sta a / F dt. 
We have 
p(—2, Q3) = 0, p(o ) = 2 a3, 


and 


(5) f ula) = p(z)1G(2)lo(2), 


dx 


where 


(6) d(x) = f(x) - (: + a + 3 ) Gt). 
Since 


o:1(—2/as3) = &, oi(o) = 0, 
and 


—2 2 
£ gu(c) + (1 +3 2) (1 ™ “) Ge) & ©, 


it follows that u(x) is monotonically increasing and that 


(7) u(x) = w(x) = (« + “) (1 as) 





MILL’S RATIO 


Again, considering 


(x) = — (2 + . (1 +r r 
(8) 7 


2)]G+32) oe, 


d2(—2/as) = ~, gr(o) = 0, 


we have 


TABLE I 
Values of u(x), u(x), we(x), and y3(x) 


wi() u(x) pa(X) 


0.000 
0.500 
0 .800 
1 .000 
1.143 
1.250 
l 
1 
1 
1 


.692 0.869 
901 .000 
.059 ake 
.180 215 
.275 .298 
351 366 
413 .423 
464 472 
507 513 
544 549 


.330 
.400 
455 
500 





=a ttt 


mwwwNw— = 
- “> ios 
| meh fe fh 
| 
} 


and 


# g(a) = -a (2-8) (1+%2) few [ae 


so that 


(9) u(z) s 


Combining (7) and (9) we have the inequalities 
w(x) S w(x) S pe(z). 


A better estimate for the upper inequality can be obtained from Jensen’s i 
equality. 





312 DES RAJ 


(10) @ if tg(t) dt /[ g(t) a| < [ o(t)g(d) a/ g(t) dt, 


where ¢(/) is convex and g(t) 2 O in (a, b). Setting a = 2, b = ~, g(t) = 


(L + (ast)/2)(t + a;/2)7', and g(t) = (t + a3/2)(1 + (ast)/2)°? er 
in (10), we have 


, 


(11) - € + 2) w(x) + eu(x) + 12 0, 


from which it follows that 


i) 0) sa) = 2-2 y/o oa Sa] 


As a check on our results, by putting a; = O in (7), (9) and (12), we obtain 
inequalities given in [2], [3] and [4]. Incidently, the function 


(ec) = (1+ Sz) ale) - x, 


used by Cohen [5] can be shown to be monotonically decreasing. For, 


$3( —2, a3) =2 a3, g3( * )= 0, 
and 


£ gs(c) 2 (1 + = r) u(x) —au(x) —1 50 from(11). 

ax s 

The closeness with which these inequalities can locate u(x) is illustrated by 
Table I, where u(x) is calculated from the tables in [7]. 


REFERENCES 

{1] J. P. Miuus, ‘‘Table of the ratio:area to bounding ordinate, for any portion of the 
normal curve,’’ Biometrika, Vol. 18 (1926), pp. 395-400. 

{2} R. D. Gorpvon, ‘Values of Mill’s ratio of area to bounding ordinate of the normal 
probability integral for large values of the argument,’’ Ann. Math. Stat., Vol. 12 
(1941), pp. 364-366. 

{3} Z. W. Brrnspaum, “‘An inequality for Mill’s ratio,’”’ Ann. Math. Stat., Vol. 13 (1942), 
pp. 245-246. 

[4] Des Ras, ‘‘On estimating the parameters of normal populations from singly truncated 
samples,’’ Ganita, Vol. 3 (1952), pp. 41-57. 

[5] A. C. Conen, “Estimating parameters of Pearson Type III populations from truncated 
samples,’ J. Amer. Stat. Assn., Vol. 45 (1950), pp. 411-423. 

(6] Des Ras, “On estimating the parameters of Type III populations from truncated 
samples,’’ J. Amer. Stat. Assn., to be published. 

(7| L. R. Satvosa, ‘Tables of Pearson’s Type III function,’”? Ann. Math. Stat., Vol. 1 
(1930), appended. 





NEWS AND NOTICES 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Dr. Om P. Aggarwal, formerly Research Assistant at Stanford, received his 
Ph.D. degree in statistics from Stanford University and is now Assistant Pro- 
fessor in the Department of Mathematics, University of Washington, Seattle. 

Dr. P. H. Anderson has accepted a position as a price economist with the 
Office of Price Stabilization, Washington, D. C. 

Mr. George F. Cramer has accepted a position as Staff Scientist with the re- 
cently established Computing Center of the Engineering Research Associates 
Division of Remington Rand Ine. ai Arlington, Virginia. 

Dr. Harald Cramér, Rector of the University of Stockholm, has accepted a 
Visiting Professorship in the Department of Mathematics, University of Cali- 
fornia, Berkeley, for the Fall Semester, 1953, and will be in the Statistical Labo- 
ratory beginning the middle of July. 

Mr. Richard DeLancie has been released from active duty as a Lieutenant 
Commander in the Naval Reserves to accept a civilian position as an Operations 
Analyst with Headquarters, Western Air Defense Force, Hamilton Air Force 
Base, California. 

Dr. Francis W. Dresch, formerly Director of Computation and Ballistics, 
U. 8. Naval Proving Ground, Dahlgren, Virginia, is working in the Economics 
Division, Operations Research, and acting as consultant on statistics, math- 
ematics and computing machinery at Stanford Research Institute, Stanford, 
California. 

Dr. Meyer Dwass, formerly with the Office of the Assistant Director for 
Statistical Standards, U. S. Bureau of the Census, has joined the staff of the 
Department of Mathematics, Northwestern University as an Assistant Professor. 

Professor Evelyn Fix was granted a leave of absence January 5 to March 31, 
1953 from the Statistical Laboratory, University of California, Berkeley to 
teach at the Sampling Demonstration Center for the Government of Thailand 
under the auspices of the Food and Agriculture Organization of the United 
Nations. 

Professor H. M. Gehman, Chairman of the Department of Mathematics of 
the University of Buffalo, has been serving as Acting Dean of the Graduate 
School of Arts and Sciences during the first semester of 1952-53. 

Dr. T. N. E. Greville of the National Office of Vital Statistics, Public Health 
Service, has been appointed a technical adviser in Vital and Health Statistics 
at the Institute of Inter-American Affairs, Rio de Janeiro, Brazil. 

Mr. Charles H. Hubbell has completed a tour of active duty with Project 
Scoop, HQ, USAF and is now a research assistant in the Decision ?rocesses Re- 
search Project of the Ford Foundation at the University of Michigan. 

Dr. Gopinath Kallianpur has returned to India after spending the last aca- 
demic year at the Institute for Advanced Study, Princeton, New Jersey. 





314 NEWS AND NOTICES 


Mr. Wharton F. Keppler, formerly Statistical Analysis Director for M. & R. 
Dietetic Laboratories, Inc., has been since March, 1952, Staff Analytical Statis- 
tician (Engineering) to the Quality Control Directorate, Central Air Procure- 
ment District, USAF, Detroit, Michigan. 

Mr. Nathan Keyfitz has been granted a year’s leave of absence by the Do- 
minion Bureau of Statistics in Ottawa to serve as statistical consultant in Indo- 
nesia under the United Nations Technical Assistance Program. 

Mr. Gunnar Kulldorf from the University of Lund, Sweden, was awarded a 
Fellowship from the Sweden-America Foundation and has spent the academic 
year 1952-53 in the Statistical Laboratory, University of California, Berkeley, 
and in the Department of Mathematical Statistics, University of North Caro- 
lina, Chapel Hill. 

Dr. Charles R. Langmuir has been granted a leave from Syracuse University, 
and has been elected a member and director of the Educational Research Corpora- 
tion and appointed Research Associate. 

Mr. R. B. Murphy, formerly at Carnegie Institute of Technology, is now asso- 
ciated with the Bell Telephone Laboratories, Inc. 

Dr. Donald B. Owen, formerly Instructor in mathematics at the University of 
Washington, has joined the staff of the Mathematics Department at Purdue 
University as an Assistant Professor and Consultant in the Statistical Labora- 
tory. 

Dr. Stefan Peters has accepted the position of actuary with Morss and Seal, 
Consulting Actuaries, New York. 

Mr. Sixto Rios, Catedratico of Mathematical Statistics, University of Madrid, 
has been appointed as Director of the Statistical School, University of Madrid. 

Dr. Murray Rosenblatt, Assistant Professor of Statistics at the University 
of Chicago, will visit the Institute of Mathematical Statistics in Stockholm 
from April to December, 1953. He will continue research with Ulf Grenander on 
time series analysis initiated when the latter spent the year 1951-52 as Visiting 
Assistant Professor at Chicago. 

Professor Henry Scheffé, Department of Mathematical Statistics, Columbia 
University, has accepted the position of Professor and Assistant Director of the 
Statistical Laboratory, University of California, Berkeley, beginning with the 
Fall term 1953. 

Dr. R. W. Shephard of the Rand Corporation has accepted a position with 
the Sandia Corporation, Albuquerque, New Mexico. 

Professor Jack Silber of Roosevelt College has returned from a tour of duty as 
Operations Analyst with the Fifth Air Force in Korea. 

Mr. Oliver A. Shaw, recalled to active duty with the Air Force in 1951, is now 
stationed at Mather AFB, California. Major Shaw was teaching mathematics 
at the University of Mississippi at the time he was recalled to active duty. 

Mr. John R. Sullivan has returned to his position as Assistant Professor of 
Mathematics at Clemson College, South Carolina, after a two-year leave of 
absence for graduate study in the Department of Mathematical Statistics at the 
University of North Carolina. 





NEWS AND NOTICES 


Summer Statistical Seminar at the University of Connecticut 


The University of Connecticut will hold its fourth annual Summer Seminar 
in Statistics from August 10 through 28, 1953. The general plan provides for 
one or two seminar sessions daily and a clinic on the treatment of problems in 
application. 

The subjects for discussion together with their organizers are: August 10-14, 
“Statistical Methodology in Physics,” Dr. E. W. Pike with Dr. Churchill Eisen- 
hart and Dr. E. P. King; August 17-21, “Statistics in Biometry and Medicine,” 
Professor G. Beall with Dr. I. Bross and Dr. D. Mainland; August 24-28, (i) 
“ASA Handbook,” Professor F. Mosteller, (ii) “Performance and Reliability of 
Complex Mechanical Assemblies,” Professor G. H. Shortley. 

Persons interested are invited to attend any or all sessions. For more detailed 
information, write the Secretary of the Seminar, Professor Geoffrey Beall, 
Statistical Laboratory, University of Connecticut, Storrs, Connecticut. 


Doctoral Dissertations in Statistics, 1952 


Listed below are the doctorates conferred during the year 1952 in the United 
States and Canada for which the dissertations were written on topics in statistics 
(or for a degree in statistics). The university, month in which degree was con- 
ferred, major subject, minor subject, and the title of the dissertation are given 
in each case if available. 

A. G. Anderson, Michigan, February, major in statistics, “The Prediction of 
Quantitative Characteristics in Polygenic Systems.” 

J. B. Bartoo, State University of Iowa, June, major in mathematics, ‘Certain 
Theorems on Order Statistics.” 

C. A. Bennett, Michigan, June, major in statistics, “Asymptotic Properties 
of Ideal Linear Estimators.” 

W. H. Clatworthy, North Carolina, August, major in mathematical statistics, 
“Partially Balanced Incomplete Block Designs with r < k.” 

E. L. Cox, North Carolina, major in experimental statistics, “On Estimating 
Size of Biological Populations.” 

M. Dwass, North Carolina, August, major in mathematical statistics, “On 
the Large Sample Power of Certain Rank Order Tests.” 

F. E. Freund, Pittsburgh, August, “Some Methods of Estimating Prior Proba- 
bilities from Heterogeneous Populations.” 

S. G. Ghurye, North Carolina, August, major in mathematical statistics, 
‘Some Problems in the Theory of Stochastic Difference Equations.” 

F. A. Graybill, Iowa State College, June, major in statistics, “On Quadratic 
Estimates of Variance Components.” 

W. C. Guenther, Washington, December, major in mathematical statistics, 
“On Testing Whether or Not a Given Percentile of One Distribution is Less 
Than or Equal to a Given Percentile of Another Distribution.” 

K. D. C. Haley, Stanford, major in statistics, minor in mathematics, “‘Esti- 
mation of the Dosage Mortality Relationship When the Dose is Subject to 
Error.” 





316 NEWS AND NOTICES 


A. T. James, Princeton, October, major in statistics, “Group Methods in 
Normal Multivariate Distribution Theory.” 

T. A. Jeeves, California (Berkeley), June, major in statistics, “Identifiability 
and Almost-Sure Estimability of Linear Structures in n-Dimensions.” 

J. C. Kiefer, Columbia, May, major in mathematical statistics, ‘“(Contribu- 
tions to the Theory of Games and Statistical Decision Functions.” 

L. M. Le Cam, California (Berkeley), June, major in statistics, “On Some 
Asymptotic Properties of Maximum Likelihood Estimates and Related Bayes’ 
{stimates.”’ 

M. R. Mickey, Jr., Iowa State College, July, major in statistics, “An Applica- 
tion of Sequential Tests to a Problem of Quality Control.” 

W. J. Moonan, Minnesota, December, major in statistics, ““The Generaliza- 
tion of the Principles of Some Modern Experimental Designs for Psychological 
and Educational Research.” 

H. Raiffa, Michigan, February, major in statistics, “Arbitration Systems for 
Generalized Two-Person Games.”’ 

N. Rudy, Chicago, August, major in statistics, “Some Problems in the Eco- 
nomics of Industrial Sampling Inspection.” 

A. R. Sen, North Carolina, major in experimental statistics, “Further Develop- 
ments of the Theory and Application of the Selection of Primary Sampling 
Units, with Special Reference to the North Carolina Agricultural Population.” 

K. W. Smillie, University of Toronto, June, major in statistics, “A Mathe- 
matical Treatment of Certain Movements of Fish—An Application of the 
Theory of Markov Processes.” 

R. F. Tate, California (Berkeley), June, major in statistics, “Contributions 
to the Theory of Random Numbers of Random Variables.”’ 

D. J. Thompson, Iowa State College, August, major in statistics, “A Theory 
of Sampling Finite Universes with Arbitrary Probabilities.” 

C. K. Tsao, Oregon, June, major in mathematical statistics, ““A General Class 
of Discrete Distributions and Mixtures of Distributions.” 

G. 8. Watson, North Carolina, major in experimental statistics, ‘Serial Cor- 
relation in Regression Analysis.” 

R. E. Wheeler, Kentucky, June, “A Variable Probability Distribution 
Function.” 


P. Whidden, Carnegie Institute of Technology, June, major in mathematics, 
“A Criterion for Measuring Closeness of Probability Distributions.” 


Ca a eR 


The following names were omitted from the 1951 list of doctoral dissertations 
in statistics. 

R. W. Allen, St. Louis, June, 1951, “Compound Statistical Distribution 
Functions.” 

W. W. Jacobs, George Washington University, May, 1951, major in mathe- 
matical statistics, “Random Matrices.” 





NEWS AND NOTICES 


New Members 


The following persons have been elected to membership in the Institute 


(November 27, 1952 to February 18, 1953) 


Bartoo, JAMEs B., Ph.D. (State Univ. Iowa), Assistant Professor, Department of Mathe- 
matics, Pennsylvania State College, State College, Pennsylvania. 

BersHaD, Max A., B.S. (College of the City of New York), Statistician, Statistical Research 
Section, Bureau of the Census, 1667 Ft. Du Pont St., S.E., Washington 20, D. C. 

BincuaM, Ricuarp STEPHEN Jr., B.S. (Carnegie Inst. of Tech.), Quality Control Super- 
visor, Atlas Powder Company, Volunteer Ordnance Works, 1413 Wright Street, Chat- 
tanooga, Tennessee. 

CuaRKE, A. Bruce, Ph.D. (Brown Univ.), Instructor, Department of Mathematics, Uni- 
versity of Michigan, Ann Arbor, Michigan. 

Deeks, Hersert W. G., M.S. (Univ. of London), Statistician to the Directorate General 
of Works in the Ministry of Works, London, Eversfield Red Hill, Chislehurst Kent, 
England. 

E1cu, Epwarp D., B.S. (Mass. Inst. of Tech.), Electrical Engineer, Anaconda Wire & 
Cable Company, Anaconda Wire & Cable Company, Hastings-on-Hudson, New York. 

Exsiom, Srarran, F.K. (Univ. of Stockholm), Research Assistant, Norrtullsgatan 16, 
Stockholm, Sweden. 

EKLIND, JAN-RoBert, F.K. (Univ. of Stockholm), Instructor in Mathematical Statistics, 
University of Stockholm, Norrtullsgatan 16, Stockholm, Sweden. 

FEDERSPIEL, CHARLES F., A.M. (Univ. of Michigan), Statistician, Communicable Disease 
Center, U. S. Public Health Service, 169 Eighth St. N.E., Atlanta 5, Georgia. 

GiLBeERT, JOHN P., B.A. (St. John’s College), Supervisor of Computers, Statistical Research 
Center, University of Chicago, 5719 Dorchester Avenue, Chicago 37, Illinois. 

Hopper, Grace M., Ph.D. (Yale Univ.), Systems Engineer, Remington Rand Inc., Walnut 
Park Plaza, Walnut at 68rd St., Philadelphia 39, Pennsylvania. 

Hoy, Water W., M.A. (Ohio State Univ.), Graduate Student, Ohio State University, 
554 E. Norwich Avenue, Columbus 1, Ohio. 

HvuseE1n, Hasan M., Ph.D. (Leeds Univ.), Professor of Statistics, Faculty of Commerce, 
Fouad University, Cairo, 28, Mobtadayan Street, Cairo, Egypt. 

KEMPERMAN, JOHAN H. B., Ph.D. (Univ. of Amsterdam), Visiting Professor, Department 
of Mathematics, Purdue University, West Lafayette, Indiana. 

McQuaip, GerrRuDE A., A.B. (Hunter College), Graduate Student, New York University, 
3034 Grand Concourse, New York 58, New York. 

Murpry, Joun E., M.A. (Columbia Univ.), Statistical Analyst, Bristol-Myers Products 
Division, c/o Bristol-Myers Products Division, 680 Fifth Avenue, New York 20, New 
York. 

PENG, Kan-cuHEN, M.S. (Univ. of Michigan), Statistical Quality Control Engineer, Parke, 
Davis & Company, 823 Seward, Detroit, Michigan. 

Pitvar, K. C. SREEDHARAN, M.S. (Univ. of Travancore), Research Associate in Statistics 
and Graduate Student, Department of Statistics, University of North Carolina, Chapel 
Hill, North Carolina. 

Price, Bruce P., M.M. (Cincinnati Conservatory of Music), Research Analyst, Engineer- 
ing Department, Analytical Mechanics Section, Southwest Research Institute, 24 
Future, San Antonio 2, Texas. 

Ricuarps, Roserr G., M.A. (Univ. of Calif., Berkeley), Mathematician, Department of 
the Army at Redstone Arsenal, Y. M. C. A., Huntsville, Alabama. 

Rosinson, L. V., Ph.D. (Harvard Univ.), Mathematician, Wright-Patterson Air Base, 
373 West First Street, Dayton, Ohio. 





318 PUBLICATIONS RECEIVED 


Sracey, Atec G., Supervisor, Special Surveys Division, Regional Office Dominion Bureau 
of Statistics, St. John’s, 75 St. Clare Avenue, St. John’s, Newfoundland, Canada. 
Worruam, A. W., M.S. (Oklahoma Agric. and Mech. College), Senior Project Analytical 


Engineer, Chance Vought Aircraft, Chance Vought Aircraft, Structures Section, Dallas, 
Texas. 


(Rn 


PUBLICATIONS RECEIVED 


Cuarg, C. E., An Introduction to Statistics, John Wiley and Sons, Inc., New York, 1953, 
x + 266 pp., $4.25. 

Cocuran, W. G., Sampling Techniques, John Wiley and Sons, Inc., New York, 1953, $6.50. 

Doos, J. L., Stochastic Processes, John Wiley and Sons, Inc., New York, 1953, vii + 654 pp., 
$10.00. 

Finney, D. J., An Introduction to Statistical Science in Agriculture, John Wiley and Sons, 
Inc., New York, 1953, 179 pp., $3.75. 

HaRTREE, D. R., Numerical Analysis, Oxford University Press, New York, 1953, xiv + 
287 pp., $6.00. 

Recenseamento Geral do Brasil (1° de Setembro de 1940) Censo Demografico and Censos Eco- 
nomicos, Service Grafico de Instituto Brasileiro de Geografia e Estatistica, Rio de 
Janeiro, 1950. (7 volumes in addition to those listed in March and September 1952 
and March 1953.) 

Recenseamento Geral do Brasil (1.° de Setembro de 1940) Censo Demografico e Censos Econo- 
micos, Servico Grafico de Instituto Brasileiro de Geografia e Estatistica, Rio de Ja- 
neiro, 1951. 

SHepuarp, R. W., Cost and Production Functions, Princeton University Press, Princeton 
1953, vii + 104 pp., $2.00. 

Turon, W. J., Introduction to the Theory of Functions of a Complex Variable, John Wiley 
and Sons, Inc., New York, 1953, ix + 230 pp., $6.50. 


TINBERGEN, J., On the Theory of Economic Policy, North-Holland Publishing Co., Amster- 
dam, 1953, iii + 78 pp., $1.80. 

Viner, Fevice, Interventi sul nuovo assetto Europeo, sull’accertamento della miseria, sulle 
ricerche econometriche e sull’illusione finanziaria, (Istituto di Scienze Economiche e 
Statistiche, Quaderni XVI), Milan, 1952, 22 pp. 





TRABAJOS DE ESTADISTICA 


Review published by ‘“‘Departamento de Estadistica’”’ of the ‘‘Consejo Superior de 
Investigaciones Cientificas’’ Madrid. Spain. 


Vol. 1V CONTENTS Cuad. I 


R. Forter i i Procesos estocdsticos en cascada. 
S. Rios Algunas leyes de probabilidad y procesos estocdsticos que se reducen 
a un tipo general de Laplace-Stieltjes. 


J. Git PELAEz Las funciones absolutas en la Estadistica. 


A. H. Ko_mMoGororr Sucesiones estacionarias en espacios de Hilbert. 


J. Tena ..........Sobrevisién por muestreo en la Universidad de Madrid. 


Notas. Croénicas. Bibliografia. Cuestiones. 


For everything in connection with works, exchanges and subscription write to Prof. 
Sixto Rios. Departamento de Estadistica del Consejo Superior de Investigaciones 
Cientificas, Serrano 123, Madrid, Spain. The Review is composed of three fascicles 
published quarterly (about 350 pages) and its price is 80 pts. for Spain and South- 
America and 3 American Dollars for all other countries. 


JOURNAL OF THE 
AMERICAN STATISTICAL ASSOCIATION 
June, 1953 


1108 16th St., N.W. Washington 6, PN. C. VOL. 48 NO. 262 


The Post-Enumeration Survey of the 1950 Census: A Case History in Survey Design 
Evt S. Marks, W. PARKER MAULDING, AND HAROLD NISSELSON 
On the Distinction between Enumerative and Analytic Surveys W. Epowarps DEMING 
The Up-and-down Method with Small — 
K. A. BROWNLEE, J. L. HopGes, Jr., AND Murray ROSENBLATT 
A Multiple Group Least Squares’ Problem and the ee of the Associated Orthogonal 
Polynomials : BRADFORD F. KIMBALL 
Wesley Clair Mitchell: The Economic Scientist pate .. ADOLF A. BERLE 
The Velocity of Time Deposits (three charts)............. GEorGE GARVY 
Changes in the Function Distribution of Income JEssE BURKHEAD 
(five charts under separate cover) 
On Errors in Matrix Inversion ...Paut S. Dwyer AND FREDERICK V. WaUGH 
Designing Single-sampling Inspection Plans when the Sample Size is Fixed (eight charts) 
ABRAHAM GOLUB 
Contidence Intervals for the Number Showing a Certain Characteristic in a Population when 
Sampling is without Replacement LEO Katz 
BOOK REVIEWS 
PUBLICATIONS RECEIVED 
THE AMERICAN STATISTICAL ASSOCIATION INVITES 
AS MEMBERS ALL PERSONS INTERESTED IN: 
1. Development of new theory and method 
2. Improvement of basic statistical data 
3. Application of statistical methods to practical problems. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 21, No. 2 - April, 1953 


E. Mauinvaup........Capital Accumulation and Efficient Allocation of Resources 
Maurice ALLAIs Notes Theoriques sur |’Incertitude de l’Avenir et le Risque 
I. N. HerstEIN AND JOHN MILNor. .An Axiomatic Approach to Measurable Utility 
Juuian L. Hoiiey ..A Dynamic Model: Part 2. Actual Model Structures and 
Numerical Results 
M. Mercier ee .Comptabilite Nationale et Tableaux Economiques 
Roserr H. Srrorz, J. C. McANuLTY, AND J. B. Natnes....Goodwin’s Nonlinear 
Theory of the Business Cycle—An Electro-Analog Solution 
RoBerT SOLOW AND PauL SAMUELSON. . Balanced Growth Under Constant Returns 
to Scale 
Karu Borcu......Effects on Demand of Changes in the Distribution of Income 
J. TINBERGEN AND H. M. A. Van Der WerRFF Four Alternative Policies to 
Restore Balance of Payments Equilibrium—A Comment and an Extension 

Book Reviews, Announcements, Data on Members and Subscribers 


Published Quarterly Subscription rates available on request 
The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics. 
Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 


for membership should be addressed to Rossen L. Cardwell, Acting Secretary, The Econometric Society, 
The University of Chicago, Chicago 37, Illinois, U. 8. A. 





BIOMETRIKA 
A Journal for the Statistical Study of Biological Problems 
Volume 40 Contents Parts 1 and 2, June 1953 


On the range of partial sums of a finite number of independent narmeat variates. By ANIS, A. A. & LLOYD, 
E. H. The Total Size of a General Stochastic Epidemic. By BAILEY, N. T. 1. Approximate confidence 
intervals. By BARTLETT, M.S. Estimating Parameters in T: runcated Pe sarson Frequency Distributions 
without resort to Higher Moments. By COHEN, A.C. The Superposition of several strictly periodic Se- 
quences of Events. By COX, D. R. & SMITH, W.L. On the utilization of marked specimens in estimating 
populations of flying insects. By CRAIG, C. C. Experimental evidence concerning contagious distribu- 
ions. By EVANS, D. A. The effect of unequal group variances on the F-test for the homogeneity of group 
means. By HORSNELL, G. ent and Absolute Moments of the Multivariate normal distribution 
with some applications. By KAMAT, A. R. On the Mean Successive Difference and its Ratio to the Root 
Mean Square. By KAMAT, A. R. ets of Significance in a 2 x 2 contingency table: extension of Finney’s 
table. By LATSCHA, R. The estimation of population parameters from data obtained by the capture- 
recapture method, Part III. By LESLIE, P. H. & CHITTY, D. Estimation of a functional relationship 
By LINDLEY, D. V. A Sequential Test for Randomness. By MOORE, P.G. Note on ‘The Jacobians of 
certain matrix transformations useful in Multivariate Analysis’. By OLKIN, I. A Method for Judging all 
Contrasts in the Analysis of Variance. By SCHEFFE, H. Tables of the Angular Transformation. 
anna W.L. The Estimation and Comparison of Strengths of Association in Contingency Tables. 
STUART, A. A problem of Interference between Two Queues. By ee J.C. Miscellanea: T 
Intervals t A accidents, A Note on Maguire, Pearson & Wynn’s paper. y BARNARD, G. A 
method of en Biologic al Populations in the Field. By CRAIG. € Ma Nate y normality in two samp ile 
t-tests. By GRON , D. G. C. Doolittle Method and the fitting of polynomial to weighted data. 
GUEST, P.G ceeah the Poisson Index of Dispersion. By KATHIRGAMATAMBY, N. Onan exten 
sion of Geary’s Theorem. By LAHA, R.G. A. Rapid Method of Estimating the Correlation Coefficient 
from the range of the deviations about the reduced major axis. By LEIGH-DUGMORE, C.H. The ef- 
fect of overlapping in bacterial counts of incubated colonies. By MACK, C. Further notes on the analysis 
of accident data. By MAGUIRE, B. A., PEARSON, E. 8. & WYNN, A. H.A. A simple method of de- 
vising best critical regions similar to the sample space in tests of an important class of composite hypotheses. 


By RAO, K. 8. 


The subscription price, payable in advance, is 456. inland, 54s. export (per volume inctatinn postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 
University College, London, W.C. 1.” All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 








MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 


Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 


JOURNAL OF THE 
ROYAL STATISTICAL SOCIETY 


Series B (Methodological) 
Contents of Volume 14, No. 1, 1952 


P. ARMITAGE ...The Statistical 
Theory of Bacterial Populations Subject to Mutation (with Discussion) 


Davin G. KENDALL 
On the Choice of a Mathematical Model to Represent Normal Bacterial Growth 
K. D. Tocuer....The Design and Analysis of Block Experiments (with Discussion) 
S. C. PEARCE Some New Designs of Latin Square Type 
SMS etcice meen nake ve Rational Decisions 
MartTIN SANDELIUS 
A Confidence Interval for the Smallest Proportion of a Binomial Population 


A. M. WALKER.... Some Properties of the Asymptotic 
Power Function of Goodness-of-Fit Tests for Linear Autoregressive Schemes 


A. GARDNER Greenwood’s “Problem of Intervals”: An Exact Solution for N = 3 


The Royal Statistical Society, 4, Portugal Street, London, W.C.2. 





SKANDINAVISK 
AKTUARIETIDSKRIFT 


1952 - Parts 1 - 2 
Contents 


ERLING SVERDRUP..... .. Basie Concepts in Life Assurance Mathematics 
J. Wo.rowirTz.....Consistent Estimators of the Parameters of a Linear Structural 
Relation 

Per Orrestab..........On the Analysis of Variance of Percentage Fractions 
J. Lerévre...... Application de la Théorie Collective du Risque a la Réassurance 
**Excess-Loss.”’ 

Kar Lat CHunG ar ....On the Renewal Theorem in Higher Dimensions 
ARNE JENSEN....A Short Remark on the Theory of Random Sampling and the 
Theory of Variance 

N. Biomegvist On an Exhaustion Process 
P. WHITTLE Certain Nonlinear Models of Population and Epidemic Theory 
P. WHITTLE. . On Principal Components and Least Square Methods of 
Factor Analysis 


Annual subscription: 10 Swedish Crowns (Approx. $2.00). 
Inquiries and orders may be addressed to the Editor, 


SKARVIKSVAGEN 7, DJURSHOLM (SWEDEN) 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 11, Parts 3 and 4, 1951 


Preface. By P. C. MAHALANOBIS. Dynamic systems of the recursive type— 
economic and statistical aspects. By HERMAN O. A. WOLD. The applicability 
of large sample tests for moving average and autoregressive schemes to series of 
short length—an experimental study: Part 1: Moving average. By ABRAHAM 
MATTHAI anv M. B. KANAN. Part 2: Autoregressive series. By S. RAJA RAO 
AND RANJAN K. SOM. Part 3: The discriminant function approach in the classi- 
fication of time series (Part III of statistical inference applied to classificatory 
problems). By C. RADHAKRISHNA RAO. On the estimation of parameters in 
a recursive system. By A. C. DAS. Bias in estimation of serial correlation co- 
efficients. By A. SREE RAMA SASTRY. Some moments of moment-statistics 
and their use in tests of significance in autocorrelated series. By A. SREE RAMA 
SASTRY. Elasticities of demand for certain Indian imports and exports. By 
V. NARASIMHA MURTI anv V. KASI SASTRI. Balance between income and 
leisure. By M. V. JAMBUNATHAN. The use of commercial punched card ma- 
chines for statistical analysis with special reference to time series problems. By 
ABRAHAM MATTHAI. On simple difference sets. By T. A. EVANS anp H. B. 
MANN. Bounds on the distribution of chi-square. By SHANTI A. VORA. On 
the limit points of relative frequencies. By D. BASU. Indian Statistical Institute: 
Nineteenth Annual Report: 1950-51 


Annual subscription: 30 rupees 
Inquiries and orders may be addressed to the 
Editor, Sankhya, Presidency College, Calcutta, India. 





